Commits · 06d3d22b456c2f87aeb1eb4517eeabb47e21fcc9 · Kirill Smelkov / linux

01 Oct, 2012 31 commits

Btrfs: cleanup extents after we finish logging inode · 06d3d22b

Liu Bo authored Aug 27, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

We should cleanup those extents after we've finished logging inode,
otherwise we may do redundant work on them.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

06d3d22b

Btrfs: only warn if we hit an error when doing the tree logging · 0fa83cdb

Josef Bacik authored Aug 24, 2012

I hit this a couple times while working on my fsync patch (all my bugs, not
normal operation), but with my new stuff we could have new errors from cases
I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
that we are notified there is a problem but the user doesn't lose their
data. We can easily commit the transaction in the case that the tree
logging fails and still be fine, so let's try and be as nice to the user as
possible. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

0fa83cdb

Btrfs: turbo charge fsync · 5dc562c5

Josef Bacik authored Aug 17, 2012

At least for the vm workload.  Currently on fsync we will

1) Truncate all items in the log tree for the given inode if they exist

and

2) Copy all items for a given inode into the log

The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing.  This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them.  We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already.  Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write

		Original	Patched
SATA drive	82KB/s		140KB/s
Fusion drive	431KB/s		2532KB/s

So around 2-6 times faster depending on your hardware.  There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok.  This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get.  All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.

The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok.  I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

5dc562c5

Btrfs: fix possible corruption when fsyncing written prealloced extents · 224ecce5

Josef Bacik authored Aug 16, 2012

While working on my fsync patch my fsync tester kept hitting mismatching
md5sums when I would randomly write to a prealloc'ed region, syncfs() and
then write to the prealloced region some more and then fsync() and then
immediately reboot. This is because the tree logging code will skip writing
csums for file extents who's generation is less than the current running
transaction. When we mark extents as written we haven't been updating their
generation so they were always being skipped. This wouldn't happen if you
were to preallocate and then write in the same transaction, but if you for
example prealloced a VM you could definitely run into this problem. This
patch makes my fsync tester happy again. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

224ecce5

Btrfs: do not allocate chunks as agressively · 54338b5c

Josef Bacik authored Aug 14, 2012

Swinging this pendulum back the other way. We've been allocating chunks up
to 2% of the disk no matter how much we actually have allocated. So instead
fix this calculation to only allocate chunks if we have more than 80% of the
space available allocated. Please test this as it will likely cause all
sorts of ENOSPC problems to pop up suddenly. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

54338b5c

Btrfs: update last trans if we don't update the inode · 7c735313

Josef Bacik authored Aug 13, 2012

There is a completely impossible situation to hit where you can preallocate
a file, fsync it, write into the preallocated region, have the transaction
commit twice and then fsync and then immediately lose power and lose all of
the contents of the write. This patch fixes this just so I feel better
about the situation and because it is lightweight, we just update the
last_trans when we finish an ordered IO and we don't update the inode
itself. This way we are completely safe and I feel better. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

7c735313

Btrfs: fix gcc warnings for 32bit compiles · 995e01b7

Jan Schmidt authored Aug 13, 2012

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

995e01b7

Btrfs: fix btrfs send for inline items and compression · 74dd17fb

Chris Mason authored Aug 07, 2012

The btrfs send code was assuming the offset of the file item into the
extent translated to bytes on disk.  If we're compressed, this isn't
true, and so it was off into extents owned by other files.

It was also improperly handling inline extents.  This solves a crash
where we may have gone past the end of the file extent item by not
testing early enough for an inline extent.  It also solves problems
where we have a whole between the end of the inline item and the start
of the full extent.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

74dd17fb

Btrfs: don't treat top/root directory inode as deleted/reused · 6d85ed05

Alexander Block authored Aug 01, 2012

We can't do the deleted/reused logic for top/root inodes as it would
create a stream that tries to delete and recreate the root dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

6d85ed05

Btrfs: ignore non-FS inodes for send/receive · 2981e225

Alexander Block authored Aug 01, 2012

We have to ignore inode/space cache objects in send/receive.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

2981e225

Btrfs: pass root instead of parent_root to iterate_inode_ref · 2f28f478

Alexander Block authored Aug 01, 2012

We need to pass the root that we determined earlier to iterate_inode_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

2f28f478

Btrfs: use <= instead of < in is_extent_unchanged · d8347fa4

Alexander Block authored Aug 01, 2012

Used the wrong compare operator here.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

d8347fa4

Btrfs: fix check for changed extent in is_extent_unchanged · 3954096d

Alexander Block authored Aug 01, 2012

The previous check was working fine, but this check should be
easier to read. Also, we could theoritically have some exotic
bugs with the previous checks.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

3954096d

Btrfs: free nce and nce_head on error in name_cache_insert · 5dc67d0b

Alexander Block authored Aug 01, 2012

Both were leaked in case of error.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

5dc67d0b

Btrfs: remove unused tmp_path from iterate_dir_item · 3e126f32

Alexander Block authored Aug 01, 2012

A leftover from older code and unused now.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

3e126f32

Btrfs: code cleanups for send/receive · e938c8ad

Alexander Block authored Jul 28, 2012

Doing some code cleanups as suggested by Arne.
Changes do not change any logic.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

e938c8ad

Btrfs: add/fix comments/documentation for send/receive · 766702ef
Alexander Block authored Jul 28, 2012
```
As the subject already said, add/fix comments.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
```
766702ef

Btrfs: update send_progress at correct places · e479d9bb

Alexander Block authored Jul 28, 2012

Updating send_progress in process_recorded_refs was not correct.
It got updated too early in the cur_inode_new_gen case.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

e479d9bb

Btrfs: make aux field of ulist 64 bit · 34d73f54

Alexander Block authored Jul 28, 2012

Btrfs send/receive uses the aux field to store inode numbers. On
32 bit machines this may become a problem.

Also fix all users of ulist_add and ulist_add_merged.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

34d73f54

Btrfs: fix use of radix_tree for name_cache in send/receive · 7e0926fe

Alexander Block authored Jul 28, 2012

We can't easily use the index of the radix tree for inums as the
radix tree uses 32bit indexes on 32bit kernels. For 32bit kernels,
we now use the lower 32bit of the inum as index and an additional
list to store multiple entries per radix tree entry.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

7e0926fe

Btrfs: fix memory leak for name_cache in send/receive · 17589bd9

Alexander Block authored Jul 28, 2012

When everything is done, name_cache_free is called which however
forgot to call kfree on the cache entries.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

17589bd9

Btrfs: don't break in the final loop of find_extent_clone · adbe7fb6

Alexander Block authored Jul 28, 2012

If we break, we may miss the clone from send_root which we prefer
over all other clones.

Commit is a result of Arne's review.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

adbe7fb6

Btrfs: use normal return path for root == send_root case · 52f9e53e

Alexander Block authored Jul 28, 2012

Don't have a seperate return path for the mentioned case. Now
we do the same "take lowest inode/offset" logic for all found clones.

Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

52f9e53e

Btrfs: use kmalloc instead of stack for backref_ctx · 35075bb0

Alexander Block authored Jul 28, 2012

Make sure to never get in trouble due to the backref_ctx
which was on the stack before.

Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

35075bb0

Btrfs: rename backref_ctx::found_in_send_root to found_itself · ee849c04

Alexander Block authored Jul 28, 2012

The new name should be easier to understand/read.

Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

ee849c04

Btrfs: remove unused use_list from send/receive code · d27aed5e
Alexander Block authored Jul 28, 2012
```
use_list is a leftover and unused.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
```
d27aed5e

Btrfs: add correct parent to check_dirs when dir got moved · ccf1626b

Alexander Block authored Jul 28, 2012

We only added the parent for the new position of a moved dir.
We also need to add the old parent of the moved dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

ccf1626b

Btrfs: remove unused code with #if 0 · 9ea3ef51

Alexander Block authored Jul 28, 2012

fs_path_remove is not used at the moment due to a previous patch.
Remove it for now (with #if 0) to avoid compile warnings.
Signed-off-by: Alexander Block <ablock84@googlemail.com>

9ea3ef51

Btrfs: add missing check for dir != tmp_dir to is_first_ref · b9291aff

Alexander Block authored Jul 28, 2012

We missed that check which resultet in all refs with the same name
being reported as first_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

b9291aff

Btrfs: fix cur_ino < parent_ino case for send/receive · 1f4692da

Alexander Block authored Jul 28, 2012

When the current inodes inum is smaller then the inum of the
parent directory strange things were happending due to wrong
path resolution and other bugs. Fix this with a new approach
for the problem.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>

1f4692da

Btrfs: add rdev to get_inode_info in send/receive · 85a7b33b
Alexander Block authored Jul 26, 2012
```
We need rdev in the next commit.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
```
85a7b33b

30 Sep, 2012 2 commits

Linux 3.6 · a0d271cb
Linus Torvalds authored Sep 30, 2012

a0d271cb

vfs: dcache: fix deadlock in tree traversal · 8110e16d

Miklos Szeredi authored Sep 17, 2012

IBM reported a deadlock in select_parent().  This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.

There are two cases when the traversal needs to be restarted:

 1) concurrent d_move(); this can only happen when not already locked,
    since taking rename_lock protects against concurrent d_move().

 2) racing with final d_put() on child just at the moment of ascending
    to parent; rename_lock doesn't protect against this rare race, so it
    can happen when already locked.

Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held.  This patch fixes all three callers of
try_to_ascend().

IBM reported that the deadlock is gone with this patch.

[ I rewrote the patch to be smaller and just do the "goto again" if the
  lock was already held, but credit goes to Miklos for the real work.
   - Linus ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8110e16d

29 Sep, 2012 2 commits

Merge tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6a3e3dbe

Linus Torvalds authored Sep 29, 2012

Pull IOMMU fixes from Joerg Roedel:
 "Two small patches:

	* One patch to fix the function declarations for
	  !CONFIG_IOMMU_API. This is causing build errors
	  in linux-next and should be fixed for v3.6.

	* Another patch to fix an IOMMU group related NULL pointer
	  dereference."

* tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix wrong assumption in iommu-group specific code
  iommu: static inline iommu group stub functions

6a3e3dbe

Merge git://git.infradead.org/users/willy/linux-nvme · 21e98932

Linus Torvalds authored Sep 29, 2012

Pull NVMe driver fixes from Matthew Wilcox:
 "Now that actual hardware has been released (don't have any yet
  myself), people are starting to want some of these fixes merged."

Willy doesn't have hardware? Guys...

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Cancel outstanding IOs on queue deletion
  NVMe: Free admin queue memory on initialisation failure
  NVMe: Use ida for nvme device instance
  NVMe: Fix whitespace damage in nvme_init
  NVMe: handle allocation failure in nvme_map_user_pages()
  NVMe: Fix uninitialized iod compiler warning
  NVMe: Do not set IO queue depth beyond device max
  NVMe: Set block queue max sectors
  NVMe: use namespace id for nvme_get_features
  NVMe: replace nvme_ns with nvme_dev for user admin
  NVMe: Fix nvme module init when nvme_major is set
  NVMe: Set request queue logical block size

21e98932

28 Sep, 2012 5 commits

mtdchar: fix offset overflow detection · 9c603e53

Linus Torvalds authored Sep 08, 2012

Sasha Levin has been running trinity in a KVM tools guest, and was able
to trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of
the memory type).  The call trace showed that it was mtdchar_mmap() that
created an invalid remap_pfn_range().

The problem is that mtdchar_mmap() does various really odd and subtle
things with the vma page offset etc, and uses the wrong types (and the
wrong overflow) detection for it.

For example, the page offset may well be 32-bit on a 32-bit
architecture, but after shifting it up by PAGE_SHIFT, we need to use a
potentially 64-bit resource_size_t to correctly hold the full value.

Also, we need to check that the vma length plus offset doesn't overflow
before we check that it is smaller than the length of the mtdmap region.

This fixes things up and tries to make the code a bit easier to read.
Reported-and-tested-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9c603e53

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6672d90f

Linus Torvalds authored Sep 28, 2012

Pull networking fixes from David S Miller:

 1) Netfilter xt_limit module can use uninitialized rules, from Jan
    Engelhardt.

 2) Wei Yongjun has found several more spots where error pointers were
    treated as NULL/non-NULL and vice versa.

 3) bnx2x was converted to pci_io{,un}map() but one remaining plain
    iounmap() got missed.  From Neil Horman.

 4) Due to a fence-post type error in initialization of inetpeer entries
    (which is where we store the ICMP rate limiting information), we can
    erroneously drop ICMPs if the inetpeer was created right around when
    jiffies wraps.

    Fix from Nicolas Dichtel.

 5) smsc75xx resume fix from Steve Glendinnig.

 6) LAN87xx smsc chips need an explicit hardware init, from Marek Vasut.

 7) qlcnic uses msleep() with locks held, fix from Narendra K.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netdev: octeon: fix return value check in octeon_mgmt_init_phy()
  inetpeer: fix token initialization
  qlcnic: Fix scheduling while atomic bug
  bnx2: Clean up remaining iounmap
  net: phy: smsc: Implement PHY config_init for LAN87xx
  smsc75xx: fix resume after device reset
  netdev: pasemi: fix return value check in pasemi_mac_phy_init()
  team: fix return value check
  l2tp: fix return value check
  netfilter: xt_limit: have r->cost != 0 case work

6672d90f

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 7596824e

Linus Torvalds authored Sep 28, 2012

Pull vfs fixes from Al Viro:
 "A couple of fixes; one for automount/lazy umount race, another a
  classic "we don't protect the refcount transition to zero with the
  lock that protects looking for object in hash" kind of crap in lockd."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  close the race in nlmsvc_free_block()
  do_add_mount()/umount -l races

7596824e

Merge branch 'for-linus-3.6-rc-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 97956605

Linus Torvalds authored Sep 28, 2012

Pull UML fixes from Richard Weinberger.

* 'for-linus-3.6-rc-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Preinclude include/linux/kern_levels.h
  um: Fix IPC on um
  um: kill thread->forking
  um: let signal_delivered() do SIGTRAP on singlestepping into handler
  um: don't leak floating point state and segment registers on execve()
  um: take cleaning singlestep to start_thread()

97956605

Merge tag 'dm-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · c3a086e6

Linus Torvalds authored Sep 28, 2012

Pull dm fixes from Alasdair G Kergon:
 "A few fixes for problems discovered during the 3.6 cycle.

  Of particular note, are fixes to the thin target's discard support,
  which I hope is finally working correctly; and fixes for multipath
  ioctls and device limits when there are no paths."

* tag 'dm-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm verity: fix overflow check
  dm thin: fix discard support for data devices
  dm thin: tidy discard support
  dm: retain table limits when swapping to new table with no devices
  dm table: clear add_random unless all devices have it set
  dm: handle requests beyond end of device instead of using BUG_ON
  dm mpath: only retry ioctl when no paths if queue_if_no_path set
  dm thin: do not set discard_zeroes_data

c3a086e6