Commits · c541a51b8ce81d003b02ed67ad3604a2e6220e3e · nexedi / linux

28 Mar, 2017 1 commit

f2fs: fix wrong max cost initialization · c541a51b

Jaegeuk Kim authored Mar 25, 2017

This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.

Cc: <stable@vger.kernel.org> # v4.10+
Fixes: b9cd2061 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c541a51b

25 Mar, 2017 1 commit

f2fs: allow write page cache when writting cp · 59c9081b

Yunlei He authored Mar 13, 2017

This patch allow write data to normal file when writting
new checkpoint.

We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

59c9081b

24 Mar, 2017 7 commits

f2fs: don't reserve additional space in xattr block · 22588f87

Chao Yu authored Mar 23, 2017

In this patch, we change xattr block disk layout as below:

Before:
			xattr node block layout
+---------------------------------------------+---------------+-------------+
|           node block xattr entries          |   reserved    | node footer |
|                  4068 Bytes                 |    4 Bytes    |  24 Bytes   |

			In memory layout
+--------------------+---------------------------------+--------------------+
|    inline xattr    |     node block xattr entries    |      reserved      |
|     200 Bytes      |         4068 Bytes              |      4 Bytes       |

After:
			xattr node block layout
+-------------------------------------------------------------+-------------+
|                  node block xattr entries                   | node footer |
|                         4072 Bytes                          |  24 Bytes   |

			In memory layout
+--------------------+---------------------------------+--------------------+
|    inline xattr    |     node block xattr entries    |      reserved      |
|     200 Bytes      |         4072 Bytes              |      4 Bytes       |

With this change, we don't need to reserve additional space in node block,
just keep reserved space in logical in-memory layout. So that it would help
to enlarge valid free space of xattr node block.

As tested, generic/026 shows max stored xattr entires number increases from
531 to 532 when inline_xattr option is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

22588f87

f2fs: clean up xattr operation · 89e9eabd

Chao Yu authored Mar 23, 2017

1. don't allocate redundant memory in read_all_xattrs.
2. introduce RESERVED_XATTR_SIZE for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

89e9eabd

f2fs: don't track volatile file in dirty inode list · 99f4b917

Chao Yu authored Mar 22, 2017

Don't track volatile file in dirty inode list, otherwise with data_flush
option, background thread will entry into endless loop for flushing
journal file's pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

99f4b917

f2fs: show the max number of volatile operations · 648d50ba

Chao Yu authored Mar 22, 2017

This patch adds to show the max number of volatile operations which are
conducting concurrently.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

648d50ba

f2fs: fix race condition in between free nid allocator/initializer · 30a61ddf

Chao Yu authored Mar 22, 2017

In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.

Thread A				Thread B
- f2fs_create
 - f2fs_new_inode
  - alloc_nid
   - __insert_nid_to_list(ALLOC_NID_LIST)
					- f2fs_balance_fs_bg
					 - build_free_nids
					  - __build_free_nids
					   - scan_nat_page
					    - add_free_nid
					     - __lookup_nat_cache
 - f2fs_add_link
  - init_inode_metadata
   - new_inode_page
    - new_node_page
     - set_node_addr
 - alloc_nid_done
  - __remove_nid_from_list(ALLOC_NID_LIST)
					     - __insert_nid_to_list(FREE_NID_LIST)

This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

30a61ddf

f2fs: use set_page_private marcro in f2fs_trace_pid · 5f4c3dec

Yunlei He authored Mar 22, 2017

Use set_page_private marcro instead of operte page struct
directly
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

5f4c3dec

f2fs: fix recording invalid last_victim · 9897159a

Chao Yu authored Mar 21, 2017

When doing garbage collection, we try to record segment offset which
locates at next one of last victim, using it as the start offset in
next searching.

But in some corner cases, recorded offset may cross the end of main
segment area, it will cause incorrectly searching in dirty_segmap
bitmap. This patch adds modular operation to avoid this issue.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9897159a

22 Mar, 2017 27 commits

f2fs: more reasonable mem_size calculating of ino_entry · 8f73cbb7
Kinglong Mee authored Mar 18, 2017
```
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
8f73cbb7

f2fs: calculate the f2fs_stat_info into base_mem · 70874fb3

Kinglong Mee authored Mar 18, 2017

The memory size of f2fs_stat_info also should be calculated.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

70874fb3

f2fs: avoid stat_inc_atomic_write for non-atomic file · 684ca7e5

Kinglong Mee authored Mar 18, 2017

After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed,
so that f2fs should not increase the stat of atomic_write.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

684ca7e5

f2fs: add missing INMEM_REVOKE trace enum definition · a58f2449
Chao Yu authored Mar 17, 2017
```
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
a58f2449

f2fs: sanity check of crc_offset from raw checkpoint · c6f89dfd

Kinglong Mee authored Mar 15, 2017

The crc_offset towards or beyond the end of block is wrong,
sanity check it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c6f89dfd

f2fs: cleanup the disk level filename updating · d03ba4cc

Kinglong Mee authored Mar 10, 2017

As discuss with Jaegeuk and Chao,
"Once checkpoint is done, f2fs doesn't need to update there-in filename at all."

The disk-level filename is used only one case,
1. create a file A under a dir
2. sync A
3. godown
4. umount
5. mount (roll_forward)

Only the rename/cross_rename changes the filename, if it happens,
a. between step 1 and 2, the sync A will caused checkpoint, so that,
   the roll_forward at step 5 never happens.
b. after step 2, the roll_forward happens, file A will roll forward
   to the result as after step 1.

So that, any updating the disk filename is useless, just cleanup it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

d03ba4cc

f2fs: cover update_free_nid_bitmap with nid_list_lock · 346fe752

Chao Yu authored Mar 13, 2017

free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be
updated atomically, use nid_list_lock cover them to avoid race in
concurrent scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

346fe752

f2fs: fix bad prefetchw of NULL page · a83d50bc

Kinglong Mee authored Mar 13, 2017

For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.

Fixes: f1e88660 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a83d50bc

f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode · bd4667cb

Kinglong Mee authored Mar 10, 2017

Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and
the return value from truncate_inline_inode isn't used, remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

bd4667cb

f2fs: move mnt_want_write_file after arguments checking · d7563861

Kinglong Mee authored Mar 10, 2017

It's needless of mnt_want_write_file for arguments checking.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

d7563861

f2fs: check new size by inode_newsize_ok in f2fs_insert_range · 46e82fb1

Kinglong Mee authored Mar 10, 2017

The inode_newsize_ok is better than only checking the maxbytes,
eg. the rlimit etc.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

46e82fb1

f2fs: avoid copy date to user-space if move file range fail · 3cecfa5f

Kinglong Mee authored Mar 10, 2017

If move file range return error, the data copied to user-space is duplicate.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3cecfa5f

f2fs: drop duplicate new_size assign in f2fs_zero_range · 087d3d8b
Kinglong Mee authored Mar 10, 2017
```
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
087d3d8b

f2fs: adjust the way of calculating nat block · 8a6aa325

Fan Li authored Mar 08, 2017

use a slightly simpler expression to calculate nat block with nid.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

8a6aa325

f2fs: add fault injection on f2fs_truncate · 14b44d23

Jaegeuk Kim authored Mar 09, 2017

Inject a fault during f2fs_truncate().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

14b44d23

f2fs: check range before defragment · 1941d7bc

Sheng Yong authored Mar 08, 2017

This patch checks the parameter range passed by ioctl to void that range
exceeds the max_file_blocks limit.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1941d7bc

f2fs: use parameter max_items instead of PIDVEC_SIZE · b0beab50

Sheng Yong authored Mar 08, 2017

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b0beab50

f2fs: add a punch discard command function · 3d6a650f

Yunlei He authored Mar 02, 2017

This patch add a function to punch discard command if one segment
reuse before discard. Split this segment from multi-segments discard
range, and discard the left bigger range.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3d6a650f

f2fs: allocate a bio for discarding when actually issuing it · c81abe34
Jaegeuk Kim authored Mar 07, 2017
```
Let's allocate a bio when issuing discard commands later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
```
c81abe34

f2fs: skip writeback meta pages if cp_mutex acquire failed · a29d0e0b

Yunlei He authored Mar 01, 2017

Skip writeback meta pages if cp_mutex lock acquire failed, cp will
flush dirty pages instead.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a29d0e0b

f2fs: show more precise message on orphan recovery failure · 5ce4738a

Jaegeuk Kim authored Mar 07, 2017

This case is not caused by fsck.f2fs. User needs to retry mount.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

5ce4738a

f2fs: remove dead macro PGOFS_OF_NEXT_DNODE · fc7d5bc4

Kinglong Mee authored Feb 28, 2017

Fixes: 3cf45747 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

fc7d5bc4

f2fs: drop duplicate radix tree lookup of nat_entry_set · 0b28b71e

Kinglong Mee authored Feb 28, 2017

The nat entry is listed from the set list for freeing,
it's duplicate to do radix tree lookup again.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
[Jaegeuk Kim: remove unnecessary f2fs_bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

0b28b71e

f2fs: make sure trace all f2fs_issue_flush · 20fda56b

Kinglong Mee authored Mar 04, 2017

The root device's issue flush trace is missing,
add it and tracing the result from submit.

Fixes d50aaeec ("f2fs: show actual device info in tracepoints")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

20fda56b

f2fs: don't allow volatile writes for non-regular file · 8ff0971f

Chao Yu authored Mar 17, 2017

Now f2fs only supports volatile writes for journal db regular file.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

8ff0971f

f2fs: don't allow atomic writes for not regular files · e811898c

Jaegeuk Kim authored Mar 17, 2017

The atomic writes only supports regular files for database.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

e811898c

f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer · 8c242db9

Jaegeuk Kim authored Mar 17, 2017

When I forced to enable atomic operations intentionally, I could hit the below
panic, since we didn't clear page->private in f2fs_invalidate_page called by
file truncation.

The panic occurs due to NULL mapping having page->private.

BUG: unable to handle kernel paging request at ffffffffffffffff
IP: drop_buffers+0x38/0xe0
PGD 5d00c067
PUD 5d00e067
PMD 0
CPU: 3 PID: 1648 Comm: fsstress Tainted: G      D    OE   4.10.0+ #5
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff9151952863c0 task.stack: ffffaaec40db4000
RIP: 0010:drop_buffers+0x38/0xe0
RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292
Call Trace:
 ? page_referenced+0x8b/0x170
 try_to_free_buffers+0xc5/0xe0
 try_to_release_page+0x49/0x50
 shrink_page_list+0x8bc/0x9f0
 shrink_inactive_list+0x1dd/0x500
 ? shrink_active_list+0x2c0/0x430
 shrink_node_memcg+0x5eb/0x7c0
 shrink_node+0xe1/0x320
 do_try_to_free_pages+0xef/0x2e0
 try_to_free_pages+0xe9/0x190
 __alloc_pages_slowpath+0x390/0xe70
 __alloc_pages_nodemask+0x291/0x2b0
 alloc_pages_current+0x95/0x140
 __page_cache_alloc+0xc4/0xe0
 pagecache_get_page+0xab/0x2a0
 grab_cache_page_write_begin+0x20/0x40
 get_read_data_page+0x2e6/0x4c0 [f2fs]
 ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
 ? truncate_data_blocks_range+0x238/0x2b0 [f2fs]
 get_lock_data_page+0x30/0x190 [f2fs]
 __exchange_data_block+0xaaf/0xf40 [f2fs]
 f2fs_fallocate+0x418/0xd00 [f2fs]
 vfs_fallocate+0x157/0x220
 SyS_fallocate+0x48/0x80
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Chao Yu: use INMEM_INVALIDATE for better tracing]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

8c242db9

21 Mar, 2017 4 commits

f2fs: build stat_info before orphan inode recovery · aa51d08a

Jaegeuk Kim authored Mar 07, 2017

f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info),
which needs stat_info allocation.
Otherwise, we can hit:

[254042.598623]  ? count_shadow_nodes+0xa0/0xa0
[254042.598633]  f2fs_sync_fs+0x65/0xd0 [f2fs]
[254042.598645]  f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs]
[254042.598657]  f2fs_write_node_pages+0x34/0x1a0 [f2fs]
[254042.598664]  ? pagevec_lookup_entries+0x1e/0x30
[254042.598673]  do_writepages+0x1e/0x30
[254042.598682]  __writeback_single_inode+0x45/0x330
[254042.598688]  writeback_single_inode+0xd7/0x190
[254042.598694]  write_inode_now+0x86/0xa0
[254042.598699]  iput+0x122/0x200
[254042.598709]  f2fs_fill_super+0xd4a/0x14d0 [f2fs]
[254042.598717]  mount_bdev+0x184/0x1c0
[254042.598934]  ? f2fs_commit_super+0x100/0x100 [f2fs]
[254042.599142]  f2fs_mount+0x15/0x20 [f2fs]
[254042.599349]  mount_fs+0x39/0x160
[254042.599554]  ? __alloc_percpu+0x15/0x20
[254042.599759]  vfs_kern_mount+0x67/0x110
[254042.599972]  do_mount+0x1bb/0xc80
[254042.600175]  ? memdup_user+0x42/0x60
[254042.600380]  SyS_mount+0x83/0xd0
[254042.600583]  entry_SYSCALL_64_fastpath+0x1e/0xad
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

aa51d08a

f2fs: fix the fault of calculating blkstart twice · 10a875f8

Kinglong Mee authored Mar 08, 2017

When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is
calculated twice.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

10a875f8

f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode · e2f0e962

Kinglong Mee authored Mar 04, 2017

The parent directory's nlink will change, not the inode.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

e2f0e962

f2fs: don't allow to get pino when filename is encrypted · 4f1bca9f

Jaegeuk Kim authored Mar 07, 2017

After renaming an encrypted file, we have no way to get its
encrypted filename from its dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4f1bca9f