Commits · b28cf57246d5b797ba725bb033110c247f2c301f · Kirill Smelkov / linux

11 Jan, 2016 2 commits

Merge branch 'misc-cleanups-4.5' of... · b28cf572

Chris Mason authored Jan 11, 2016

Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
Signed-off-by: Chris Mason <clm@fb.com>

b28cf572

Merge branch 'misc-for-4.5' of... · a3058101

Chris Mason authored Jan 11, 2016

Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5

a3058101

07 Jan, 2016 30 commits

Btrfs: Check metadata redundancy on balance · ee592d07

Sam Tygier authored Jan 06, 2016

When converting a filesystem via balance check that metadata mode
is at least as redundant as the data mode. For example give warning
when:
-dconvert=raid1 -mconvert=single
Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk>
[ minor message reformatting ]
Signed-off-by: David Sterba <dsterba@suse.com>

ee592d07

btrfs: statfs: report zero available if metadata are exhausted · ca8a51b3

David Sterba authored Oct 10, 2015

There is one ENOSPC case that's very confusing. There's Available
greater than zero but no file operation succeds (besides removing
files). This happens when the metadata are exhausted and there's no
possibility to allocate another chunk.

In this scenario it's normal that there's still some space in the data
chunk and the calculation in df reflects that in the Avail value.

To at least give some clue about the ENOSPC situation, let statfs report
zero value in Avail, even if there's still data space available.

Current:
  /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test

New:
  /dev/sdb1             4.0G  3.3G     0 100% /mnt/test

We calculate the remaining metadata space minus global reserve. If this
is (supposedly) smaller than zero, there's no space. But this does not
hold in practice, the exhausted state happens where's still some
positive delta. So we apply some guesswork and compare the delta to a 4M
threshold. (Practically observed delta was 2M.)

We probably cannot calculate the exact threshold value because this
depends on the internal reservations requested by various operations, so
some operations that consume a few metadata will succeed even if the
Avail is zero. But this is better than the other way around.
Signed-off-by: David Sterba <dsterba@suse.com>

ca8a51b3

btrfs: preallocate path for snapshot creation at ioctl time · 8546b570

David Sterba authored Nov 10, 2015

We can also preallocate btrfs_path that's used during pending snapshot
creation and avoid another late ENOMEM failure.
Signed-off-by: David Sterba <dsterba@suse.com>

8546b570

btrfs: allocate root item at snapshot ioctl time · b0c0ea63

David Sterba authored Nov 10, 2015

The actual snapshot creation is delayed until transaction commit. If we
cannot get enough memory for the root item there, we have to fail the
whole transaction commit which is bad. So we'll allocate the memory at
the ioctl call and pass it along with the pending_snapshot struct. The
potential ENOMEM will be returned to the caller of snapshot ioctl.
Signed-off-by: David Sterba <dsterba@suse.com>

b0c0ea63

btrfs: do an allocation earlier during snapshot creation · a1ee7362

David Sterba authored Nov 10, 2015

We can allocate pending_snapshot earlier and do not have to do cleanup
in case of failure.
Signed-off-by: David Sterba <dsterba@suse.com>

a1ee7362

btrfs: use smaller type for btrfs_path locks · 4fb72bf2

David Sterba authored Nov 27, 2015

The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see:

* overall size of btrfs_path drops down from 136 to 112 (-24 bytes),
* better packing in a slab page +6 objects
* the whole structure now fits to 2 cachelines
* slight decrease in code size:

   text    data     bss     dec     hex filename
 938731   43670   23144 1005545   f57e9 fs/btrfs/btrfs.ko.before
 938203   43670   23144 1005017   f55d9 fs/btrfs/btrfs.ko.after

(and the generated assembly does not change much)

The main purpose is to decrease the size of the structure without
affecting performance. The byte access is usually well behaving accross
arches, the locks are not accessed frequently and sometimes just
compared to zero.

Note for further size reduction attempts: the slots could be made u16
but this might generate worse code on some arches (non-byte and non-int
access). Also the range of operations on slots is wider compared to
locks and the potential performance drop should be evaluated first.
Signed-off-by: David Sterba <dsterba@suse.com>

4fb72bf2

btrfs: use smaller type for btrfs_path lowest_level · 7853f15b

David Sterba authored Nov 27, 2015

The level is 0..7, we can use smaller type. The size of btrfs_path is now
136 bytes from 144, which is +2 objects that fit into a 4k slab.
Signed-off-by: David Sterba <dsterba@suse.com>

7853f15b

btrfs: use smaller type for btrfs_path reada · dccabfad

David Sterba authored Nov 27, 2015

The possible values for reada are all positive and bounded, we can later
save some bytes by storing it in u8.
Signed-off-by: David Sterba <dsterba@suse.com>

dccabfad

btrfs: cleanup, use enum values for btrfs_path reada · e4058b54

David Sterba authored Nov 27, 2015

Replace the integers by enums for better readability. The value 2 does
not have any meaning since a7175319
"Btrfs: do less aggressive btree readahead" (2009-01-22).
Signed-off-by: David Sterba <dsterba@suse.com>

e4058b54

btrfs: constify static arrays · 4d4ab6d6

David Sterba authored Nov 19, 2015

There are a few statically initialized arrays that can be made const.
The remaining (like file_system_type, sysfs attributes or prop handlers)
do not allow that due to type mismatch when passed to the APIs or
because the structures are modified through other members.
Signed-off-by: David Sterba <dsterba@suse.com>

4d4ab6d6

btrfs: constify remaining structs with function pointers · 20e5506b
David Sterba authored Nov 19, 2015
```
* struct extent_io_ops
* struct btrfs_free_space_op
Signed-off-by: David Sterba <dsterba@suse.com>
```
20e5506b

btrfs tests: replace whole ops structure for free space tests · 28f0779a

David Sterba authored Nov 19, 2015

Preparatory work for making btrfs_free_space_op constant. In
test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with
own version thus preventing constification. We can rework it so we
replace the whole structure with the correct function pointers.
Signed-off-by: David Sterba <dsterba@suse.com>

28f0779a

btrfs: use list_for_each_entry* in backref.c · a7ca4225

Geliang Tang authored Dec 21, 2015

Use list_for_each_entry*() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

a7ca4225

btrfs: use list_for_each_entry_safe in free-space-cache.c · 7ae1681e

Geliang Tang authored Dec 18, 2015

Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

7ae1681e

btrfs: use list_for_each_entry* in check-integrity.c · b69f2bef

Geliang Tang authored Dec 18, 2015

Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b69f2bef

Btrfs: use linux/sizes.h to represent constants · ee22184b

Byongho Lee authored Dec 15, 2015

We use many constants to represent size and offset value.  And to make
code readable we use '256 * 1024 * 1024' instead of '268435456' to
represent '256MB'.  However we can make far more readable with 'SZ_256MB'
which is defined in the 'linux/sizes.h'.

So this patch replaces 'xxx * 1024 * 1024' kind of expression with
single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
not a power of 2. And I haven't touched to '4096' & '8192' because it's
more intuitive than 'SZ_4KB' & 'SZ_8KB'.
Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ee22184b

btrfs: cleanup, remove stray return statements · 7928d672
David Sterba authored Nov 30, 2015
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
7928d672

btrfs: zero out delayed node upon allocation · 352dd9c8

Alexandru Moise authored Oct 25, 2015

It's slightly cleaner to zero-out the delayed node upon allocation
than to do it by hand in btrfs_init_delayed_node() for a few members
Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

352dd9c8

btrfs: pass proper enum type to start_transaction() · 575a75d6

Alexandru Moise authored Oct 25, 2015

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

575a75d6

btrfs: switch __btrfs_fs_incompat return type from int to bool · 9780c497

Alexandru Moise authored Oct 18, 2015

Conform to __btrfs_fs_incompat() cast-to-bool (!!) by explicitly
returning boolean not int.
Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

9780c497

btrfs: remove unused inode argument from uncompress_inline() · e40da0e5

Byongho Lee authored May 19, 2015

The inode argument is never used from the beginning, so remove it.
Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e40da0e5

btrfs: don't use slab cache for struct btrfs_delalloc_work · 100d5702

David Sterba authored Dec 08, 2015

Although we prefer to use separate caches for various structs, it seems
better not to do that for struct btrfs_delalloc_work. Objects of this
type are allocated rarely, when transaction commit calls
btrfs_start_delalloc_roots, requesting delayed iputs.

The objects are temporary (with some IO involved) but still allocated
and freed within __start_delalloc_inodes. Memory allocation failure is
handled.

The slab cache is empty most of the time (observed on several systems),
so if we need to allocate a new slab object, the first one has to
allocate a full page. In a potential case of low memory conditions this
might fail with higher probability compared to using the generic slab
caches.
Signed-off-by: David Sterba <dsterba@suse.com>

100d5702

btrfs: drop duplicate prefix from scrub workqueues · 0de270fa

David Sterba authored Dec 01, 2015

The helper btrfs_alloc_workqueue will add the "btrfs-" prefix.
Signed-off-by: David Sterba <dsterba@suse.com>

0de270fa

btrfs: verbose error when we find an unexpected item in sys_array · 93a3d467
David Sterba authored Nov 30, 2015
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
93a3d467

btrfs: handle invalid num_stripes in sys_array · f5cdedd7

David Sterba authored Nov 30, 2015

We can handle the special case of num_stripes == 0 directly inside
btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
catch other unhandled cases where we fail to validate external data.

A crafted or corrupted image crashes at mount time:

BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0
BTRFS info (device loop0): disk space caching is enabled
BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
Stack:
 637af890 60062489 602aeb2e 604192ba
 60387961 00000011 637af8a0 6038a835
 637af9c0 6038776b 634ef32b 00000000
Call Trace:
 [<6001c86d>] show_stack+0xfe/0x15b
 [<6038a835>] dump_stack+0x2a/0x2c
 [<6038776b>] panic+0x13e/0x2b3
 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
 [<601cfbbe>] open_ctree+0x192d/0x27af
 [<6019c2c1>] btrfs_mount+0x8f5/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<6019bcb0>] btrfs_mount+0x2e4/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<600d710b>] do_mount+0xa35/0xbc9
 [<600d7557>] SyS_mount+0x95/0xc8
 [<6001e884>] handle_syscall+0x6b/0x8e
Reported-by: Jiri Slaby <jslaby@suse.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
CC: stable@vger.kernel.org	# 3.19+
Signed-off-by: David Sterba <dsterba@suse.com>

f5cdedd7

btrfs: better packing of btrfs_delayed_extent_op · 35b3ad50

David Sterba authored Nov 30, 2015

btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
and has 8 unused bytes. Reducing the level type to u8 makes it possible
to squeeze it to the padding byte after key. The bitfields were switched
to bool as there's space to store the full byte without increasing the
whole structure, besides that the generated assembly is smaller.

struct btrfs_delayed_extent_op {
	struct btrfs_disk_key      key;                  /*     0    17 */
	u8                         level;                /*    17     1 */
	bool                       update_key;           /*    18     1 */
	bool                       update_flags;         /*    19     1 */
	bool                       is_data;              /*    20     1 */

	/* XXX 3 bytes hole, try to pack */

	u64                        flags_to_set;         /*    24     8 */

	/* size: 32, cachelines: 1, members: 6 */
	/* sum members: 29, holes: 1, sum holes: 3 */
	/* last cacheline: 32 bytes */
};

The final size is 32 bytes which gives +26 object per slab page.

   text	   data	    bss	    dec	    hex	filename
 938811	  43670	  23144	1005625	  f5839	fs/btrfs/btrfs.ko.before
 938747	  43670	  23144	1005561	  f57f9	fs/btrfs/btrfs.ko.after
Signed-off-by: David Sterba <dsterba@suse.com>

35b3ad50

btrfs: put delayed item hook into inode · 8089fe62

David Sterba authored Nov 19, 2015

Inodes for delayed iput allocate a trivial helper structure, let's place
the list hook directly into the inode and save a kmalloc (killing a
__GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.

The inode can be put into the delayed_iputs list more than once and we
have to keep the count. This means we can't use the list_splice to
process a bunch of inodes because we'd lost track of the count if the
inode is put into the delayed iputs again while it's processed.
Signed-off-by: David Sterba <dsterba@suse.com>

8089fe62

btrfs: Support convert to -d dup for btrfs-convert · c5ca8781

Zhao Lei authored Nov 19, 2015

Since we will add support for -d dup for non-mixed filesystem,
kernel need to support converting to this raid-type.

This patch remove limitation of above case.

Tested by following script:
(combination of dup conversion with fsck):

export TEST_DEV='/dev/vdc'
export TEST_DIR='/var/ltf/tester/mnt'

do_dup_test()
{
    local m_from="$1"
    local d_from="$2"
    local m_to="$3"
    local d_to="$4"

    echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to"

    umount "$TEST_DIR" &>/dev/null
    ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1
    mount "$TEST_DEV" "$TEST_DIR" || return 1

    cp -a /sbin/* "$TEST_DIR"

    [[ "$m_from" != "$m_to" ]] && {
        ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1
    }

    [[ "$d_from" != "$d_to" ]] && {
	local opt=()
	[[ "$d_to" == single ]] && opt+=("-f")
        ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1
    }

    umount "$TEST_DIR" || return 1
    ./btrfsck "$TEST_DEV" || return 1
    echo

    return 0
}

test_all()
{
    for m_from in single dup; do
    for d_from in single dup; do
    for m_to in single dup; do
    for d_to in single dup; do
    do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1
    done
    done
    done
    done
}

test_all
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

c5ca8781

Btrfs: igrab inode in writepage · be7bd730

Josef Bacik authored Oct 22, 2015

We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode. We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages. If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic. Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

be7bd730

Btrfs: add missing brelse when superblock checksum fails · b2acdddf

Anand Jain authored Oct 07, 2015

Looks like oversight, call brelse() when checksum fails. Further down the
code, in the non error path, we do call brelse() and so we don't see
brelse() in the goto error paths.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b2acdddf

30 Dec, 2015 2 commits

btrfs: don't run delayed references while we are creating the free space tree · 511711af

Chris Mason authored Dec 30, 2015

This is a short term solution to make sure btrfs_run_delayed_refs()
doesn't change the extent tree while we are scanning it to create the
free space tree.

Longer term we need to synchronize scanning the block groups one by one,
similar to what happens during a balance.
Signed-off-by: Chris Mason <clm@fb.com>

511711af

btrfs: fix compiling with CONFIG_BTRFS_DEBUG enabled. · b4570aa9

Chris Mason authored Dec 30, 2015

Merging in the free space tree deleted a variable needed when
CONFIG_BTRFS_DEBUG=y
Signed-off-by: Chris Mason <clm@fb.com>

b4570aa9

23 Dec, 2015 6 commits

btrfs: fix warning on uninit variable in btrfs_finish_chunk_alloc · 140e639f
Chris Mason authored Dec 23, 2015
```
map->num_stripes really can't be zero, but just in case.
Signed-off-by: Chris Mason <clm@fb.com>
```
140e639f
Merge branch 'freespace-4.5' into for-linus-4.5 · f0f76413
Chris Mason authored Dec 23, 2015

f0f76413

Merge branch 'for-chris-4.5' of... · a53fe257

Chris Mason authored Dec 23, 2015

Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5

a53fe257

Merge branch 'dev/simplify-set-bit' of... · bb9d6876

Chris Mason authored Dec 23, 2015

Merge branch 'dev/simplify-set-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
Signed-off-by: Chris Mason <clm@fb.com>

bb9d6876

Merge branch 'dev/gfp-flags' of... · 13d5d15d

Chris Mason authored Dec 23, 2015

Merge branch 'dev/gfp-flags' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5

13d5d15d

Merge branch 'cleanup/misc-simplify' of... · afa427cf

Chris Mason authored Dec 23, 2015

Merge branch 'cleanup/misc-simplify' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5

afa427cf