Commits · 61ecda68652591c3a7131e6bdb51639612a1244c · Kirill Smelkov / linux

22 Jan, 2018 40 commits

btrfs: remove check for BTRFS_FS_STATE_ERROR which we just set · 61ecda68

Anand Jain authored Jan 04, 2018

__btrfs_handle_fs_error() sets BTRFS_FS_STATE_ERROR, and calls
btrfs_handle_error() so no need to check if the BTRFS_FS_STATE_ERROR
is set in btrfs_handle_error(). And there is no other user of
btrfs_handle_error() as well.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

61ecda68

Btrfs: make raid6 rebuild retry more · 8810f751

Liu Bo authored Jan 02, 2018

There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.

This extends btrfs to do more retries and each retry fails only one
stripe.  Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.

The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

8810f751

Btrfs: fix scrub to repair raid6 corruption · 762221f0

Liu Bo authored Jan 02, 2018

The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.

We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.

[1]: https://patchwork.kernel.org/patch/10091755/Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

762221f0

btrfs: factor btrfs_check_rw_degradable() to check given device · 6528b99d

Anand Jain authored Dec 18, 2017

Update btrfs_check_rw_degradable() to check against the given device if
its lost.

We can use this function to know if the volume is going to be in
degraded mode OR failed state, when the given device fails.  Which is
needed when we are handling the device failed state.

A preparatory patch does not affect the flow as such.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ enhance comment ]
Signed-off-by: David Sterba <dsterba@suse.com>

6528b99d

btrfs: sink unlock_extent parameter gfp_flags · e43bbe5e

David Sterba authored Dec 12, 2017

All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the
parameter to the function, though we lose some of the slightly better
semantics of GFP_KERNEL in some places, it's worth cleaning up the
callchains.
Signed-off-by: David Sterba <dsterba@suse.com>

e43bbe5e

btrfs: add separate helper for unlock_extent_cached with GFP_ATOMIC · d810a4be

David Sterba authored Dec 07, 2017

There's only one instance where we pass different gfp mask to
unlock_extent_cached. Add a separate helper for that and then we can
drop the gfp parameter from unlock_extent_cached.
Signed-off-by: David Sterba <dsterba@suse.com>

d810a4be

btrfs: drop unused parameters from mount_subvol · 5bedc48a

David Sterba authored Jan 02, 2018

Recent patches reworking the mount path left some unused parameters. We
pass a vfsmount to mount_subvol, the flags and data (ie. mount options)
have been already applied and we will not need them.
Signed-off-by: David Sterba <dsterba@suse.com>

5bedc48a

btrfs: cleanup unnecessary string dup in btrfs_parse_options() · e215772c

Misono, Tomohiro authored Dec 14, 2017

Long ago, commit edf24abe ("btrfs: sanity mount option parsing and
early mount code") split the btrfs_parse_options() into two parts
(btrfs_parse_early_options() and btrfs_parse_options()). As a result,
btrfs_parse_optins no longer gets called twice and is the last one to
parse mount option string. Therefore there is no need to dup it.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e215772c

Btrfs: remove unused wait in btrfs_stripe_hash · 203e02d9

Liu Bo authored Dec 22, 2017

In fact nobody is waiting on @wait's waitqueue, it can be safely
removed.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

203e02d9

btrfs: Remove redundant pair of bio_get/set in __btrfs_submit_dio_bio · 36f7894f

Nikolay Borisov authored Dec 13, 2017

The bio is not referenced after it has been submitted and the endio is
going to consume the sole reference on successful submission. On error,
the callers of __btrfs_submit_dio_bio do invoke bio_put so we don't
leak it either.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

36f7894f

btrfs: Remove redundant bio_get/bio_set pair from submit_one_bio · ffc9c8dd

Nikolay Borisov authored Dec 13, 2017

The bio is never referenced after it has been submitted so there is no
point in getting an extra reference.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ffc9c8dd

btrfs: Remove redundant bio_get/set from submit_dio_repair_bio · ea057f6d

Nikolay Borisov authored Dec 13, 2017

The bio that is passsed is the newly created repair bio which already
has a reference count of 1, which is going to be consumed by the
endio routine on successful submission. On error the handler also
calls bio_put.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ea057f6d

btrfs: Remove redundant bio_get/set calls in compressed read/write paths · 32506af5

Nikolay Borisov authored Dec 13, 2017

bio_get/set is necessary only if the bio is going to be referenced
following submissions. In the code paths where such calls are made
we don't really need them since the bio is referenced only if
btrfs_map_bio returns an error. And this function can return an error
prior to submission only. So referencing the bio is safe. Furthermore
we do call bio_endio which will consume the last reference. So let's
remove the redundant calls.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

32506af5

btrfs: Improve btrfs_search_slot description · 4271ecea

Nikolay Borisov authored Dec 13, 2017

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4271ecea

btrfs: heuristic: call get4bits directly · 36243c91

David Sterba authored Dec 12, 2017

As it's a single instance and local to the file, we don't need to pass
it as an argument.
Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

36243c91

btrfs: heuristic: open code copy_call callback of radix sort · 7add17be

David Sterba authored Dec 12, 2017

The callback is trivial and we don't need the abstraction for our
purposes. Let's open code it.
Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

7add17be

btrfs: heuristic: open code get_num callback of radix sort · 23ae8c63

David Sterba authored Dec 12, 2017

The callback is trivial and we don't need the abstraction for our
purposes. Let's open code it and also make the array types explicit.
Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

23ae8c63

btrfs: remove unused arg from parse_subvol_options() · 78f6beac

Misono, Tomohiro authored Jan 17, 2018

Remove unused arg 'holder' from parse_subvol_options(), which has been
forgotten to be cleaned in the commit b99beb110e2d ("btrfs: split
parse_early_options() in two").
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

78f6beac

btrfs: remove unused setup_root_args() · 83085935

Misono, Tomohiro authored Dec 14, 2017

Since setup_root_args() is not used anymore, just remove it.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

83085935

btrfs: split parse_early_options() in two · d7407606

Misono, Tomohiro authored Dec 14, 2017

Now parse_early_options() is used by both btrfs_mount() and
btrfs_mount_root(). However, the former only needs subvol related part
and the latter needs the others.

Therefore extract the subvol related parts from parse_early_options() and
move it to new parse function (parse_subvol_options()).
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

d7407606

btrfs: cleanup btrfs_mount() using btrfs_mount_root() · 312c89fb

Misono, Tomohiro authored Dec 14, 2017

Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting
btrfs_mount() called twice in mount path.

Old btrfs_mount() will do:
0. VFS layer calls vfs_kern_mount() with registered file_system_type
   (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way.
1. btrfs_parse_early_options() parses "subvolid=" mount option and set the
   value to subvol_objectid. Otherwise, subvol_objectid has the initial
   value of 0
2. check subvol_objectid is 5 or not. Assume this time id is not 5, then
   btrfs_mount() returns by calling mount_subvol()
3. In mount_subvol(), original mount options are modified to contain
   "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with
   btrfs_fs_type and new options
4. btrfs_mount() is called again
5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0)
   to subvol_objectid
6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol()
   is not called. btrfs_mount() finishes mounting a root
7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it
   calls mount_subtree()
8. return subvolume's dentry

Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount()
is the cause of complication.

Instead, new btrfs_mount() will do:
1. parse subvol id related options for later use in mount_subvol()
2. mount device's root by calling vfs_kern_mount() with
   btrfs_root_fs_type, which is not registered to VFS by
   register_filesystem(). As a result, btrfs_mount_root() is called
3. return by calling mount_subvol()

The code of 2. is moved from the first part of mount_subvol().

The semantics of device holder changes from btrfs_fs_type to
btrfs_root_fs_type and has to be used in all contexts. Otherwise we'd
get wrong results when mount and dev scan would not check the same
thing. (this has been found indendently and the fix is folded into this
patch)
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ fold the btrfs_control_ioctl fixup, extend the comment ]
Signed-off-by: David Sterba <dsterba@suse.com>

312c89fb

btrfs: add btrfs_mount_root() and new file_system_type · 72fa39f5

Misono, Tomohiro authored Dec 14, 2017

Add btrfs_mount_root() and new file_system_type for preparation of cleanup
of btrfs_mount(). Code path is not changed yet.

btrfs_mount_root() is almost the same as current btrfs_mount(), but doesn't
have subvolume related part.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

72fa39f5

btrfs: unify extent_page_data type passed as void · aab6e9ed

David Sterba authored Nov 30, 2017

Functions called from extent_write_cache_pages used void* as generic
callback data, but all of them convert it to extent_page_data, or use it
directly.
Signed-off-by: David Sterba <dsterba@suse.com>

aab6e9ed

btrfs: sink writepage parameter to extent_write_cache_pages · 935db853

David Sterba authored Jun 23, 2017

The function extent_write_cache_pages is modelled after
write_cache_pages which is a generic interface and the writepage
parameter makes sense there. In btrfs we know exactly which callback
we're going to use, so we can pass it directly.
Signed-off-by: David Sterba <dsterba@suse.com>

935db853

btrfs: sink flush_fn to extent_write_cache_pages · 25b860e0
David Sterba authored Jun 23, 2017
```
All callers pass the same value flush_write_bio.
Signed-off-by: David Sterba <dsterba@suse.com>
```
25b860e0

btrfs: merge two flush_write_bio helpers · e2932ee0

David Sterba authored Jun 23, 2017

flush_epd_write_bio is same as flush_write_bio, no point having two such
functions. Merge them to flush_write_bio. The 'noinline' attribute is
removed as it does not have any meaning.
Signed-off-by: David Sterba <dsterba@suse.com>

e2932ee0

btrfs: Rename bin_search -> btrfs_bin_search · a74b35ec

Nikolay Borisov authored Dec 08, 2017

Currently there are 2 function doing binary search on btrfs nodes:
bin_search and btrfs_bin_search. The latter being a simple wrapper for
the former. So eliminate the wrapper and just rename bin_search to
btrfs_bin_search. No functional changes
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

a74b35ec

btrfs: sink extent_write_full_page tree argument · 0a9b0e53

Nikolay Borisov authored Dec 08, 2017

The tree argument passed to extent_write_full_page is referenced from
the page being passed to the same function. Since we already have
enough information to get the reference, remove the function parameter.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0a9b0e53

btrfs: sink extent_write_locked_range tree parameter · 5e3ee236

Nikolay Borisov authored Dec 08, 2017

This function is called only from submit_compressed_extents and the
io tree being passed is always that of the inode. But we are also
passing the inode, so just move getting the io tree pointer in
extent_write_locked_range to simplify the signature.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

5e3ee236

btrfs: Remove pair of bio_get/put in btrfs_schedule_bio · 3e798068

Nikolay Borisov authored Dec 11, 2017

This code was added in 492bb6de ("Btrfs: Hold a reference on bios
during submit_bio, add some extra bio checks"). However, holding a
reference on a bio is necessary only if it's going to be referenced
after the submit_bio returns and the bio is completed. In this
particular instance this is not the case so there is no need to hold
an extra reference since we directly return.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

3e798068

btrfs: Fix out of bounds access in btrfs_search_slot · 9ea2c7c9

Nikolay Borisov authored Dec 12, 2017

When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
the level variable is going to be 7 (this is the max height of the
tree). On the other hand btrfs_cow_block is always called with
"level + 1" as an index into the nodes and slots arrays. This leads to
an out of bounds access. Admittdely this will be benign since an OOB
access of the nodes array will likely read the 0th element from the
slots array, which in this case is going to be 0 (since we start CoW at
the top of the tree). The OOB access into the slots array in turn will
read the 0th and 1st values of the locks array, which would both be 0
at the time. However, this benign behavior relies on the fact that the
path being passed hasn't been initialised, if it has already been used to
query a btree then it could potentially have populated the nodes/slots arrays.

Fix it by explicitly checking if we are at level 7 (the maximum allowed
index in nodes/slots arrays) and explicitly call the CoW routine with
NULL for parent's node/slot.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Fixes-coverity-id: 711515
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

9ea2c7c9

btrfs: remove duplicate includes · 87c46ec7

Pravin Shedge authored Dec 06, 2017

These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

87c46ec7

btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker · f3038ee3

Nikolay Borisov authored Dec 05, 2017

This function was introduced by 247e743c ("Btrfs: Use async helpers
to deal with pages that have been improperly dirtied") and it didn't do
any error handling then. This function might very well fail in ENOMEM
situation, yet it's not handled, this could lead to inconsistent state.
So let's handle the failure by setting the mapping error bit.

Cc: stable@vger.kernel.org
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f3038ee3

btrfs: put btrfs_ioctl_vol_args_v2 related defines together · ad8bc4d0

Anand Jain authored Dec 06, 2017

Just a code spatial rearrangement, no functional change.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ad8bc4d0

btrfs: show options: use helper to convert compression type string · 0f628c63

David Sterba authored Oct 31, 2017

Use the helper, if the COMPRESS option is set, the result is always
defined and not empty.
Signed-off-by: David Sterba <dsterba@suse.com>

0f628c63

btrfs: prop: use common helper for type to string conversion · 802a5c69
David Sterba authored Oct 31, 2017
```
Use the helper for conversion, keep the semantics.
Signed-off-by: David Sterba <dsterba@suse.com>
```
802a5c69
btrfs: SETFLAGS ioctl: use helper for compression type conversion · 93370509
David Sterba authored Oct 31, 2017
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
93370509

btrfs: compression: add helper for type to string conversion · e128f9c3

David Sterba authored Oct 31, 2017

There are several places opencoding this conversion, add a helper now
that we have 3 compression algorithms.
Signed-off-by: David Sterba <dsterba@suse.com>

e128f9c3

btrfs: remove redundant check in btrfs_get_extent_fiemap · bf8d32b9

Nikolay Borisov authored Dec 01, 2017

Before returning hole_em in btrfs_get_fiemap_extent we check if it's different
than null. However, by the time this null check is triggered we already know
hole_em is not null because it means it points to the em we found and it
has already been dereferenced.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

bf8d32b9

btrfs: Remove unused variable in btrfs_get_extent · 5c9a702e

Nikolay Borisov authored Dec 01, 2017

trans was statically assigned to NULL and this never changed over the
course of btrfs_get_extent. So remove any code which checks whether
trans != NULL and just hardcode the fact trans is always NULL.

Resolves-coverity-id: 112806
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

5c9a702e