Commits · fe94793e555f650fab656649521fc38aaab4874e · Kirill Smelkov / linux

22 Jul, 2016 1 commit

f2fs: get victim segment again after new cp · fe94793e

Yunlei He authored Jul 22, 2016

Previous selected segment may become free after write_checkpoint,
if we do garbage collect on this segment, and then new_curseg happen
to reuse it, it may cause f2fs_bug_on as below.

	panic+0x154/0x29c
	do_garbage_collect+0x15c/0xaf4
	f2fs_gc+0x2dc/0x444
	f2fs_balance_fs.part.22+0xcc/0x14c
	f2fs_balance_fs+0x28/0x34
	f2fs_map_blocks+0x5ec/0x790
	f2fs_preallocate_blocks+0xe0/0x100
	f2fs_file_write_iter+0x64/0x11c
	new_sync_write+0xac/0x11c
	vfs_write+0x144/0x1e4
	SyS_write+0x60/0xc0

Here, maybe we check sit and ssa type during reset_curseg. So, we check
segment is stale or not, and select a new victim to avoid this.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

fe94793e

20 Jul, 2016 4 commits

f2fs: handle error case with f2fs_bug_on · 6f3ec995

Jaegeuk Kim authored Jul 19, 2016

It's enough to show BUG or WARN by f2fs_bug_on for error case.
Then, we don't need to remain corrupted filesystem.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

6f3ec995

f2fs: avoid data race when deciding checkpoin in f2fs_sync_file · dd11a5df

Jaegeuk Kim authored Jul 19, 2016

When fs utilization is almost full, f2fs_sync_file should do checkpoint if
there is not enough space for roll-forward later. (i.e. space_for_roll_forward)
So, currently we have no lock for sbi->alloc_valid_block_count, resulting in
race condition.

In rare case, we can get -ENOSPC when doing roll-forward which triggers

	if (is_valid_blkaddr(sbi, dest, META_POR)) {
		if (src == NULL_ADDR) {
			err = reserve_new_block(&dn);
			f2fs_bug_on(sbi, err);
			...
		}
		...
	}
in do_recover_data.

So, this patch avoids that situation in advance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

dd11a5df

f2fs: support an ioctl to move a range of data blocks · 4dd6f977

Jaegeuk Kim authored Jul 08, 2016

This patch implements moving a range of data blocks from source file to
destination file.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4dd6f977

f2fs: fix to report error number of f2fs_find_entry · 91246c21

Chao Yu authored Jul 19, 2016

This patch fixes to report the right error number of f2fs_find_entry to
its caller.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

91246c21

18 Jul, 2016 1 commit
- f2fs: avoid memory allocation failure due to a long length · 363cad7f
  Jaegeuk Kim authored Jul 16, 2016
```
We need to avoid ENOMEM due to unexpected long length.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
  363cad7f
15 Jul, 2016 7 commits

f2fs: reset default idle interval value · dcf25fe8

Chao Yu authored Jul 15, 2016

The default value of idle interval is 2 mins, but for most time when
screen shutdown, there are still operations during the 2 mins interval,
and gc's sleep time is about 30 secs to 60 secs, so there is almost no
chance for GC thread to do garbage collecting.

Set default value of idle interval value from 2 mins to 5 secs for
fixing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

dcf25fe8

f2fs: use blk_plug in all the possible paths · 9dfa1baf

Jaegeuk Kim authored Jul 13, 2016

This patch reverts 19a5f5e2 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.

The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9dfa1baf

f2fs: fix to avoid data update racing between GC and DIO · 82e0a5aa

Chao Yu authored Jul 13, 2016

Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:

For write case:
Thread A				Thread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
    - get_more_blocks
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_page
					    - do_write_data_page
					    migrate data block to new block address
   - dio_bio_submit
   update user data to old block address

For read case:
Thread A                                Thread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
    - get_more_blocks
					- f2fs_balance_fs
					 - f2fs_gc
					  - do_garbage_collect
					   - gc_data_segment
					    - move_data_page
					     - do_write_data_page
					     migrate data block to new block address
					  - write_checkpoint
					   - do_checkpoint
					    - clear_prefree_segments
					     - f2fs_issue_discard
                                             discard old block adress
   - dio_bio_submit
   update user buffer from obsolete block address

In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

82e0a5aa

f2fs: add maximum prefree segments · 44a83499

Jaegeuk Kim authored Jul 13, 2016

In 1TB storage, we need to admit 22841 prefree segments, which can consume
too much segments.
This patch sets 8GB in max. prefree segments in that case.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

44a83499

f2fs: disable extent_cache for fcollapse/finsert inodes · 5f281fab

Jaegeuk Kim authored Jul 12, 2016

This reduces the elapsed time to do xfstests/generic/017.

Before: 458 s
After:  390 s
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

5f281fab

f2fs: refactor __exchange_data_block for speed up · 0a2aa8fb

Jaegeuk Kim authored Jul 08, 2016

This reduces the elapsed time to do xfstests/generic/017.

Before: 715 s
After:  458 s
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

0a2aa8fb

f2fs: fix ERR_PTR returned by bio · 1d353eb7

Jaegeuk Kim authored Jul 12, 2016

This is to fix wrong error pointer handling flow reported by Dan.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1d353eb7

08 Jul, 2016 15 commits

f2fs: avoid mark_inode_dirty · b56ab837

Jaegeuk Kim authored Jun 30, 2016

Let's check inode's dirtiness before calling mark_inode_dirty.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b56ab837

f2fs: move i_size_write in f2fs_write_end · a2ee0a30

Jaegeuk Kim authored Jul 07, 2016

We don't need to do i_size_write under page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a2ee0a30

f2fs: fix to avoid redundant discard during fstrim · c24a0fd6

Chao Yu authored Jul 07, 2016

With below test steps, f2fs will issue redundant discard when doing fstrim,
the reason is that we issue discards for both prefree segments and
consecutive freed region user wants to trim, part regions they covered are
overlapped, here, we change to do not to issue any discards for prefree
segments in trimmed range.

1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs
2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/
3. dd if=/dev/zero  of=/mnt/f2fs/a bs=2M count=1
4. dd if=/dev/zero  of=/mnt/f2fs/b bs=1M count=1
5. sync
6. rm /mnt/f2fs/a /mnt/f2fs/b
7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/

Before:
<...>-5428  [001] ...1  9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200
<...>-5428  [001] ...1  9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

After:
<...>-6764  [000] ...1  9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c24a0fd6

f2fs: avoid mismatching block range for discard · c7b41e16

Yunlei He authored Jul 07, 2016

This patch skip discard block range smaller than trim_minlen,
and can not be merged by neighbour
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c7b41e16

f2fs: fix incorrect f_bfree calculation in ->statfs · 3e6d0b4d

Chao Yu authored Jul 06, 2016

As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
includes two parts: visible free blocks and over-provision blocks. This
patch corrrects the calculation.

fsblkcnt_t   f_bfree;   /* free blocks in fs */
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3e6d0b4d

f2fs: use percpu_rw_semaphore · ec795418

Jaegeuk Kim authored Jun 30, 2016

This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

ec795418

f2fs: skip to check the block address of node page · 3bdad3c7

Jaegeuk Kim authored Jun 30, 2016

If the node page is up-to-date, it should be alive.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3bdad3c7

f2fs: shrink critical region in spin_lock · 2555a2d5

Jaegeuk Kim authored Jun 30, 2016

This patch shrinks the critical region in spin_lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

2555a2d5

f2fs: call SetPageUptodate if needed · 237c0790

Jaegeuk Kim authored Jun 30, 2016

SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

237c0790

f2fs: introduce f2fs_set_page_dirty_nobuffer · fe76b796

Jaegeuk Kim authored Jun 30, 2016

This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

fe76b796

f2fs: remove unnecessary goto statement · a0995af6

Tiezhu Yang authored Jun 28, 2016

When base_addr is NULL, there is no need to call kzfree,
it should return -ENOMEM directly. Additionally, it is
better to initialize variable 'error' with 0.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a0995af6

f2fs: add nodiscard mount option · 64058be9

Chao Yu authored Jul 03, 2016

This patch adds 'nodiscard' mount option.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

64058be9

f2fs: fix to redirty page if fail to gc data page · 72e1c797

Chao Yu authored Jul 03, 2016

If we fail to move data page during foreground GC, we should give another
chance to writeback that page which was set dirty previously by writer.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

72e1c797

f2fs: fix to detect truncation prior rather than EIO during read · 1563ac75

Chao Yu authored Jul 03, 2016

In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1563ac75

f2fs: fix to avoid reading out encrypted data in page cache · 78682f79

Chao Yu authored Jul 03, 2016

For encrypted inode, if user overwrites data of the inode, f2fs will read
encrypted data into page cache, and then do the decryption.

However reader can race with overwriter, and it will see encrypted data
which has not been decrypted by overwriter yet. Fix it by moving decrypting
work to background and keep page non-uptodated until data is decrypted.

Thread A				Thread B
- f2fs_file_write_iter
 - __generic_file_write_iter
  - generic_perform_write
   - f2fs_write_begin
    - f2fs_submit_page_bio
					- generic_file_read_iter
					 - do_generic_file_read
					  - lock_page_killable
					  - unlock_page
					  - copy_page_to_iter
					  hit the encrypted data in updated page
    - lock_page
    - fscrypt_decrypt_page
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

78682f79

06 Jul, 2016 6 commits

f2fs: avoid latency-critical readahead of node pages · ac6f1999

Jaegeuk Kim authored Jun 16, 2016

The f2fs_map_blocks is very related to the performance, so let's avoid any
latency to read ahead node pages.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

ac6f1999

f2fs: avoid writing node/metapages during writes · 2c237eba
Jaegeuk Kim authored Jun 16, 2016
```
Let's keep more node/meta pages in run time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
2c237eba

f2fs: produce more nids and reduce readahead nats · ad4edb83

Jaegeuk Kim authored Jun 16, 2016

The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.

And, let's keep some free nids as much as possible.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

ad4edb83

f2fs: detect host-managed SMR by feature flag · 52763a4b

Jaegeuk Kim authored Jun 13, 2016

If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs
by default.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

52763a4b

f2fs: call update_inode_page for orphan inodes · 67c3758d
Jaegeuk Kim authored Jun 13, 2016
```
Let's store orphan inode pages right away.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
67c3758d

f2fs: report error for f2fs_parent_dir · 3e19886e

Jaegeuk Kim authored Jun 09, 2016

If there is no dentry, we can report its error correctly.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3e19886e

15 Jun, 2016 1 commit

f2fs: find parent dentry correctly · 8be0fea9

Sheng Yong authored Jun 04, 2016

If dotdot directory is corrupted, its slot may be ocupied by another
file. In this case, dentry[1] is not the parent directory. Rename and
cross-rename will update the inode in dentry[1] incorrectly.   This
patch finds dotdot dentry by name.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
[Jaegeuk Kim: remove wron bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

8be0fea9

13 Jun, 2016 2 commits

f2fs: fix deadlock in add_link failure · c92737ce

Jaegeuk Kim authored Jun 07, 2016

mkdir                        sync_dirty_inode
 - init_inode_metadata
   - lock_page(node)
   - make_empty_dir
                             - filemap_fdatawrite()
                              - do_writepages
                              - lock_page(data)
                              - write_page(data)
                               - lock_page(node)
   - f2fs_init_acl
    - error
   - truncate_inode_pages
    - lock_page(data)

So, we don't need to truncate data pages in this error case, which will
be done by f2fs_evict_inode.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c92737ce

f2fs: introduce mode=lfs mount option · 36abef4e

Jaegeuk Kim authored Jun 03, 2016

This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

36abef4e

08 Jun, 2016 3 commits

f2fs: skip clean segment for gc · aa987273

Jaegeuk Kim authored Jun 06, 2016

If a segment in a section is clean or prefreed, we don't need to get its summary
and do gc.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

aa987273

f2fs: drop any block plugging · 19a5f5e2

Jaegeuk Kim authored Jun 04, 2016

In f2fs, we don't need to keep block plugging for NODE and DATA writes, since
we already merged bios as much as possible.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

19a5f5e2

f2fs: avoid reverse IO order for NODE and DATA · 7dfeaa32

Jaegeuk Kim authored Jun 04, 2016

There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(),
which incur unnecessary reversed bio submission.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

7dfeaa32