Commits · a8ac900b8163703340a2fdad11c32f96b8fe686d · Kirill Smelkov / linux

05 Sep, 2014 2 commits

ext4: use non-movable memory for the ext4 superblock · a8ac900b

Gioh Kim authored Sep 04, 2014

Since the ext4 superblock is not released until the file system is
unmounted, allocate the buffer cache entry for the ext4 superblock out
of the non-moveable are to allow page migrations and thus CMA
allocations to more easily succeed if the CMA area is limited.
Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

a8ac900b

fs/buffer.c: support buffer cache allocations with gfp modifiers · 3b5e6454

Gioh Kim authored Sep 04, 2014

A buffer cache is allocated from movable area because it is referred
for a while and released soon.  But some filesystems are taking buffer
cache for a long time and it can disturb page migration.

New APIs are introduced to allocate buffer cache with user specific
flag.  *_gfp APIs are for user want to set page allocation flag for
page cache allocation.  And *_unmovable APIs are for the user wants to
allocate page cache from non-movable area.
Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

3b5e6454

04 Sep, 2014 6 commits

ext4: renumber EXT4_EX_* flags to avoid flag aliasing problems · d26e2c4d
Theodore Ts'o authored Sep 04, 2014
```
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
d26e2c4d

jbd2: optimize jbd2_log_do_checkpoint() a bit · 0e5ecf0a

Jan Kara authored Sep 04, 2014

When we discover written out buffer in transaction checkpoint list we
don't have to recheck validity of a transaction. Either this is the
last buffer in a transaction - and then we are done - or this isn't
and then we can just take another buffer from the checkpoint list
without dropping j_list_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

0e5ecf0a

jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint() · dc6e8d66

Theodore Ts'o authored Sep 04, 2014

The __jbd2_journal_remove_checkpoint() doesn't require an elevated
b_count; indeed, until the jh structure gets released by the call to
jbd2_journal_put_journal_head(), the bh's b_count is elevated by
virtue of the existence of the jh structure.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

dc6e8d66

ext4: drop the EXT4_STATE_DELALLOC_RESERVED flag · 754cfed6

Theodore Ts'o authored Sep 04, 2014

Having done a full regression test, we can now drop the
DELALLOC_RESERVED state flag.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

754cfed6

ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVED · e3cf5d5d

Theodore Ts'o authored Sep 04, 2014

The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
because it was too hard to make sure the mballoc and get_block flags
could be reliably passed down through all of the codepaths that end up
calling ext4_mb_new_blocks().

Since then, we have mb_flags passed down through most of the code
paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
as it used to.

This commit plumbs in the last of what is required, and then adds a
WARN_ON check to make sure we haven't missed anything.  If this passes
a full regression test run, we can then drop
EXT4_STATE_DELALLOC_RESERVED.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

e3cf5d5d

ext4: pass allocation_request struct to ext4_(alloc,splice)_branch · a5211002

Theodore Ts'o authored Sep 04, 2014

Instead of initializing the allocation_request structure in
ext4_alloc_branch(), set it up in ext4_ind_map_blocks(), and then pass
it to ext4_alloc_branch() and ext4_splice_branch().

This allows ext4_ind_map_blocks to pass flags in the allocation
request structure without having to add Yet Another argument to
ext4_alloc_branch().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

a5211002

02 Sep, 2014 6 commits

ext4: track extent status tree shrinker delay statictics · eb68d0e2

Zheng Liu authored Sep 01, 2014

This commit adds some statictics in extent status tree shrinker.  The
purpose to add these is that we want to collect more details when we
encounter a stall caused by extent status tree shrinker.  Here we count
the following statictics:
  stats:
    the number of all objects on all extent status trees
    the number of reclaimable objects on lru list
    cache hits/misses
    the last sorted interval
    the number of inodes on lru list
  average:
    scan time for shrinking some objects
    the number of shrunk objects
  maximum:
    the inode that has max nr. of objects on lru list
    the maximum scan time for shrinking some objects

The output looks like below:
  $ cat /proc/fs/ext4/sda1/es_shrinker_info
  stats:
    28228 objects
    6341 reclaimable objects
    5281/631 cache hits/misses
    586 ms last sorted interval
    250 inodes on lru list
  average:
    153 us scan time
    128 shrunk objects
  maximum:
    255 inode (255 objects, 198 reclaimable)
    125723 us max scan time

If the lru list has never been sorted, the following line will not be
printed:
    586ms last sorted interval
If there is an empty lru list, the following lines also will not be
printed:
    250 inodes on lru list
  ...
  maximum:
    255 inode (255 objects, 198 reclaimable)
    0 us max scan time

Meanwhile in this commit a new trace point is defined to print some
details in __ext4_es_shrink().

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

eb68d0e2

ext4: improve extents status tree trace point · e963bb1d

Zheng Liu authored Sep 01, 2014

This commit improves the trace point of extents status tree.  We rename
trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
in ext4_es_scan() and we can not identify them from the result.

Further this commit fixes a variable name in trace point in order to
keep consistency with others.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e963bb1d

ext4: fix comments about get_blocks · d91bd2c1

Seunghun Lee authored Sep 01, 2014

get_blocks is renamed to get_block.
Signed-off-by: Seunghun Lee <waydi1@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

d91bd2c1

ext4: enable block_validity by default · 45f1a9c3

Darrick J. Wong authored Sep 01, 2014

Enable by default the block_validity feature, which checks for
collisions between newly allocated blocks and critical system
metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

45f1a9c3

jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() · 88fe1acb

Theodore Ts'o authored Sep 01, 2014

__wait_cp_io() is only called by jbd2_log_do_checkpoint().  Fold it in
to make it a bit easier to understand.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

88fe1acb

jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() · be1158cc

Theodore Ts'o authored Sep 01, 2014

__process_buffer() is only called by jbd2_log_do_checkpoint(), and it
had a very complex locking protocol where it would be called with the
j_list_lock, and sometimes exit with the lock held (if the return code
was 0), or release the lock.

This was confusing both to humans and to smatch (which erronously
complained that the lock was taken twice).

Folding __process_buffer() to the caller allows us to simplify the
control flow, making the resulting function easier to read and reason
about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
bytes (over 4% of the text size).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

be1158cc

01 Sep, 2014 12 commits

ext4: rename ext4_ext_find_extent() to ext4_find_extent() · ed8a1a76
Theodore Ts'o authored Sep 01, 2014
```
Make the function name less redundant.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
ed8a1a76

ext4: reuse path object in ext4_move_extents() · 3bdf14b4

Theodore Ts'o authored Sep 01, 2014

Reuse the path object in ext4_move_extents() so we don't unnecessarily
free and reallocate it.

Also clean up the get_ext_path() wrapper so that it has the same
semantics of freeing the path object on error as ext4_ext_find_extent().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

3bdf14b4

ext4: reuse path object in ext4_ext_shift_extents() · ee4bd0d9

Theodore Ts'o authored Sep 01, 2014

Now that the semantics of ext4_ext_find_extent() are much cleaner,
it's safe and more efficient to reuse the path object across the
multiple calls to ext4_ext_find_extent() in ext4_ext_shift_extents().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ee4bd0d9

ext4: teach ext4_ext_find_extent() to realloc path if necessary · 10809df8

Theodore Ts'o authored Sep 01, 2014

This adds additional safety in case for some reason we end reusing a
path structure which isn't big enough for current depth of the inode.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

10809df8

ext4: allow a NULL argument to ext4_ext_drop_refs() · b7ea89ad

Theodore Ts'o authored Sep 01, 2014

Teach ext4_ext_drop_refs() to accept a NULL argument, much like
kfree().  This allows us to drop a lot of checks to make sure path is
non-NULL before calling ext4_ext_drop_refs().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

b7ea89ad

ext4: call ext4_ext_drop_refs() from ext4_ext_find_extent() · 523f431c

Theodore Ts'o authored Sep 01, 2014

In nearly all of the calls to ext4_ext_find_extent() where the caller
is trying to recycle the path object, ext4_ext_drop_refs() gets called
to release the buffer heads before the path object gets overwritten.
To simplify things for the callers, and to avoid the possibility of a
memory leak, make ext4_ext_find_extent() responsible for dropping the
buffers.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

523f431c

ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code · dfe50809

Theodore Ts'o authored Sep 01, 2014

Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(),
ext4_split_extent(), ext4_convert_unwritten_extents_endio().

This requires fixing all of their callers to potentially
ext4_ext_find_extent() to free the struct ext4_ext_path object in case
of an error, and there are interlocking dependencies all the way up to
ext4_ext_map_blocks(), ext4_swap_extents(), and
ext4_ext_remove_space().

Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it
is no longer necessary.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

dfe50809

ext4: drop EXT4_EX_NOFREE_ON_ERR in convert_initialized_extent() · 4f224b8b

Theodore Ts'o authored Sep 01, 2014

Transfer responsibility of freeing struct ext4_ext_path on error to
ext4_ext_find_extent().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

4f224b8b

ext4: collapse ext4_convert_initialized_extents() · e8b83d93

Theodore Ts'o authored Sep 01, 2014

The function ext4_convert_initialized_extents() is only called by a
single function --- ext4_ext_convert_initalized_extents().  Inline the
code and get rid of the unnecessary bits in order to simplify the code.

Rename ext4_ext_convert_initalized_extents() to
convert_initalized_extents() since it's a static function that is
actually only used in a single caller, ext4_ext_map_blocks().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e8b83d93

ext4: teach ext4_ext_find_extent() to free path on error · 705912ca

Theodore Ts'o authored Sep 01, 2014

Right now, there are a places where it is all to easy to leak memory
on an error path, via a usage like this:

	struct ext4_ext_path *path = NULL

	while (...) {
		...
		path = ext4_ext_find_extent(inode, block, path, 0);
		if (IS_ERR(path)) {
			/* oops, if path was non-NULL before the call to
			   ext4_ext_find_extent, we've leaked it!  :-(  */
			...
			return PTR_ERR(path);
		}
		...
	}

Unfortunately, there some code paths where we are doing the following
instead:

	path = ext4_ext_find_extent(inode, block, orig_path, 0);

and where it's important that we _not_ free orig_path in the case
where ext4_ext_find_extent() returns an error.

So change the function signature of ext4_ext_find_extent() so that it
takes a struct ext4_ext_path ** for its third argument, and by
default, on an error, it will free the struct ext4_ext_path, and then
zero out the struct ext4_ext_path * pointer.  In order to avoid
causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
ext4_ext_find_extent() to use the original behavior of forcing the
caller to deal with freeing the original path pointer on the error
case.

The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
allows for a gentle transition and makes the patches easier to verify.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

705912ca

ext4: fix accidental flag aliasing in ext4_map_blocks flags · bd30d702

Theodore Ts'o authored Sep 01, 2014

Commit b8a86845 introduced an accidental flag aliasing between
EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.

Fortunately, this didn't introduce any untorward side effects --- we
got lucky.  Nevertheless, fix this and leave a warning to hopefully
avoid this from happening in the future.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

bd30d702

ext4: fix ZERO_RANGE bug hidden by flag aliasing · 713e8dde

Theodore Ts'o authored Sep 01, 2014

We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN
falgs, which apparently was hiding a bug that was unmasked when this
flag aliasing issue was addressed (see the subsequent commit).  The
reproduction case was:

   fsx -N 10000 -l 500000 -r 4096 -t 4096 -w 4096 -Z -R -W /vdb/junk

... which would cause fsx to report corruption in the data file.

The fix we have is a bit of an overkill, but I'd much rather be
conservative for now, and we can optimize ZERO_RANGE_FL handling
later.  The fact that we need to zap the extent_status cache for the
inode is unfortunate, but correctness is far more important than
performance.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>

713e8dde

31 Aug, 2014 4 commits

ext4: fix ext4_swap_extents() error handling · 19008f6d

Theodore Ts'o authored Aug 31, 2014

If ext4_ext_find_extent() returns an error, we have to clear path1 or
path2 or else we would end up trying to free an ERR_PTR, which would
be bad.

Also eliminate some redundant code and mark the error paths as unlikely()
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

19008f6d

ext4: refactor ext4_move_extents code base · fcf6b1b7

Dmitry Monakhov authored Aug 30, 2014

ext4_move_extents is too complex for review. It has duplicate almost
each function available in the rest of other codebase. It has useless
artificial restriction orig_offset == donor_offset. But in fact logic
of ext4_move_extents is very simple:

Iterate extents one by one (similar to ext4_fill_fiemap_extents)
   ->Iterate each page covered extent (similar to generic_perform_write)
     ->swap extents for covered by page (can be shared with IOC_MOVE_DATA)
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

fcf6b1b7

ext4: use ext4_ext_next_allocated_block instead of mext_next_extent · f8fb4f41

Dmitry Monakhov authored Aug 30, 2014

This allows us to make mext_next_extent static and potentially get rid
of it.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

f8fb4f41

ext4: use ext4_update_i_disksize instead of opencoded ones · ee124d27
Dmitry Monakhov authored Aug 30, 2014
```
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
ee124d27

30 Aug, 2014 6 commits
- ext4: remove a duplicate call in ext4_init_new_dir() · 52c826db
  Wang Shilong authored Aug 29, 2014
```
ext4_journal_get_write_access() has just been called in ext4_append()
calling it again here is duplicated.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  52c826db
- ext4: convert do_split() to use the ERR_PTR convention · f8b3b59d
  Theodore Ts'o authored Aug 29, 2014
```
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  f8b3b59d
- ext4: convert dx_probe() to use the ERR_PTR convention · dd73b5d5
  Theodore Ts'o authored Aug 29, 2014
```
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  dd73b5d5
- ext4: convert ext4_bread() to use the ERR_PTR convention · 1c215028
  Theodore Ts'o authored Aug 29, 2014
```
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  1c215028
- ext4: convert ext4_getblk() to use the ERR_PTR convention · 10560082
  Theodore Ts'o authored Aug 29, 2014
```
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  10560082
- ext4: convert ext4_dx_find_entry() to use the ERR_PTR convention · 537d8f93
  Theodore Ts'o authored Aug 29, 2014
```
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
```
  537d8f93
29 Aug, 2014 4 commits

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · d4f03186

Linus Torvalds authored Aug 29, 2014

Pull ext4 bugfixes from Ted Ts'o:
 "Ext4 bug fixes for 3.17, to provide better handling of memory
  allocation failures, and to fix some journaling bugs involving
  journal checksums and FALLOC_FL_ZERO_RANGE"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix same-dir rename when inline data directory overflows
  jbd2: fix descriptor block size handling errors with journal_csum
  jbd2: fix infinite loop when recovering corrupt journal blocks
  ext4: update i_disksize coherently with block allocation on error path
  ext4: fix transaction issues for ext4_fallocate and ext_zero_range
  ext4: fix incorect journal credits reservation in ext4_zero_range
  ext4: move i_size,i_disksize update routines to helper function
  ext4: fix BUG_ON in mb_free_blocks()
  ext4: propagate errors up to ext4_find_entry()'s callers

d4f03186

Merge tag 'dm-3.17-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · ef13c8af

Linus Torvalds authored Aug 29, 2014

Pull device mapper fix from Mike Snitzer:
 "Fix a 3.17-rc1 regression introduced by switching the DM crypt target
  to using per-bio data"

* tag 'dm-3.17-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm crypt: fix access beyond the end of allocated space

ef13c8af

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 522a15db

Linus Torvalds authored Aug 29, 2014

Pull block layer fixes from Jens Axboe:
 "A smaller collection of fixes that have come up since the initial
  merge window pull request.  This contains:

   - error handling cleanup and support for larger than 16 byte cdbs in
     sg_io() from Christoph.  The latter just matches what bsg and
     friends support, sg_io() got left out in the merge.

   - an option for brd to expose partitions in /proc/partitions.  They
     are hidden by default for compat reasons.  From Dmitry Monakhov.

   - a few blk-mq fixes from me - killing a dead/unused flag, fix for
     merging happening even if turned off, and correction of a few
     comments.

   - removal of unnecessary ->owner setting in systemace.  From Michal
     Simek.

   - two related fixes for a problem with nesting freezing of queues in
     blk-mq.  One from Ming Lei removing an unecessary freeze operation,
     and another from Tejun fixing the nesting regression introduced in
     the merge window.

   - fix for a BUG_ON() at bio_endio time when protection info is
     attached and the IO has an error.  From Sagi Grimberg.

   - two scsi_ioctl bug fixes for regressions with scsi-mq from Tony
     Battersby.

   - a cfq weight update fix and subsequent comment update from Toshiaki
     Makita"

* 'for-linus' of git://git.kernel.dk/linux-block:
  cfq-iosched: Add comments on update timing of weight
  cfq-iosched: Fix wrong children_weight calculation
  block: fix error handling in sg_io
  fix regression in SCSI_IOCTL_SEND_COMMAND
  scsi-mq: fix requests that use a separate CDB buffer
  block: support > 16 byte CDBs for SG_IO
  block: cleanup error handling in sg_io
  brd: add ram disk visibility option
  block: systemace: Remove .owner field for driver
  blk-mq: blk_mq_freeze_queue() should allow nesting
  blk-mq: correct a few wrong/bad comments
  block: Fix BUG_ON when pi errors occur
  blk-mq: don't allow merges if turned off for the queue
  blk-mq: get rid of unused BLK_MQ_F_SHOULD_SORT flag
  blk-mq: fix WARNING "percpu_ref_kill() called more than once!"

522a15db

alpha: io: implement relaxed accessor macros for writes · 9e36c633

Will Deacon authored Jul 24, 2014

write{b,w,l,q}_relaxed are implemented by some architectures in order to
permit memory-mapped I/O writes with weaker barrier semantics than the
non-relaxed variants.

This patch implements these write macros for Alpha, in the same vein as
the relaxed read macros, which are already implemented.
Acked-by: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9e36c633