- 23 Feb, 2016 40 commits
-
-
Chao Yu authored
With a partition which was formated as multi segments in one section, we stated incorrectly for count of gc operation. e.g., for a partition with segs_per_sec = 4 cat /sys/kernel/debug/f2fs/status GC calls: 208 (BG: 7) - data segments : 104 (52) - node segments : 104 (24) GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52, rather than 208. Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch avoids to remain inefficient victim segment number selected by a victim. For example, if all the dirty segments has same valid blocks, we can get the victim segments descending order due to keeping wrong last segment number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Shawn Lin authored
f2fs_convert_inline_page introduce what read_inline_data already does for copying out the inline data from inode_page. We can use read_inline_data instead to simplify the code. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When doing test with fstests/generic/068 in inline_dentry enabled f2fs, following oops dmesg will be reported: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 11841 at fs/inode.c:273 drop_nlink+0x49/0x50() Modules linked in: f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state CPU: 5 PID: 11841 Comm: fsstress Tainted: G O 4.5.0-rc1 #45 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 0000000000000111 ffff88009cdf7ae8 ffffffff813e5944 0000000000002e41 0000000000000000 0000000000000111 0000000000000000 ffff88009cdf7b28 ffffffff8106a587 ffff88009cdf7b58 ffff8804078fe180 ffff880374a64e00 Call Trace: [<ffffffff813e5944>] dump_stack+0x48/0x64 [<ffffffff8106a587>] warn_slowpath_common+0x97/0xe0 [<ffffffff8106a5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff81231039>] drop_nlink+0x49/0x50 [<ffffffffa07b95b4>] f2fs_rename2+0xe04/0x10c0 [f2fs] [<ffffffff81231ff1>] ? lock_two_nondirectories+0x81/0x90 [<ffffffff813f454d>] ? lockref_get+0x1d/0x30 [<ffffffff81220f70>] vfs_rename+0x2e0/0x640 [<ffffffff8121f9db>] ? lookup_dcache+0x3b/0xd0 [<ffffffff810b8e41>] ? update_fast_ctr+0x21/0x40 [<ffffffff8134ff12>] ? security_path_rename+0xa2/0xd0 [<ffffffff81224af6>] SYSC_renameat2+0x4b6/0x540 [<ffffffff810ba8ed>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810022ba>] ? exit_to_usermode_loop+0x7a/0xd0 [<ffffffff817e0ade>] ? int_ret_from_sys_call+0x52/0x9f [<ffffffff810bdc90>] ? trace_hardirqs_on_caller+0x100/0x1c0 [<ffffffff81224b8e>] SyS_renameat2+0xe/0x10 [<ffffffff8121f08e>] SyS_rename+0x1e/0x20 [<ffffffff817e0957>] entry_SYSCALL_64_fastpath+0x12/0x6f ---[ end trace 2b31e17995404e42 ]--- This is because: in the same inline directory, when we renaming one file from source name to target name which is not existed, once space of inline dentry is not enough, inline conversion will be triggered, after that all data in inline dentry will be moved to normal dentry page. After attaching the new entry in coverted dentry page, still we try to remove old entry in original inline dentry, since old entry has been moved, so it obviously doesn't make any effect, result in remaining old entry in converted dentry page. Now, we have two valid dentries pointed to the same inode which has nlink value of 1, deleting them both, above warning appears. This issue can be reproduced easily as below steps: 1. mount f2fs with inline_dentry option 2. mkdir dir 3. touch 180 files named [001-180] in dir 4. rename dir/180 dir/181 5. rm dir/180 dir/181 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Should check and show correct return value of update_dent_inode in ->rename. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Shawn Lin authored
>From the function name of get_valid_checkpoint, it seems to return the valid cp or NULL for caller to check. If no valid one is found, f2fs_fill_super will print the err log. But if get_valid_checkpoint get one valid(the return value indicate that it's valid, however actually it is invalid after sanity checking), then print another similar err log. That seems strange. Let's keep sanity checking inside the procedure of geting valid cp. Another improvement we gained from this move is that even the large volume is supported, we check the cp in advanced to skip the following procedure if failing the sanity checking. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Shawn Lin authored
read_raw_super_block was introduced to help find the first valid superblock. Commit da554e48 ("f2fs: recovering broken superblock during mount") changed the behaviour to read both of them and check whether need the recovery flag or not. So the comment before this function isn't consistent with what it actually does. Also, the origin code use two tags to round the err cases, which isn't so readable. So this patch amend the comment and slightly reorganize it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache, we try to load nat entries a) from journal of current segment cache or b) from NAT pages for updating, during the process, write lock of nat_tree_lock will be held to avoid inconsistent condition in between nid cache and nat cache caused by racing among nat entry shrinker, checkpointer, nat entry updater. But this way may cause low efficient when updating nat cache, because it serializes accessing in journal cache or reading NAT pages. Here, we reorder lock and update flow as below to enhance accessing concurrency: - get_node_info - down_read(nat_tree_lock) - lookup nat cache --- hit -> unlock & return - lookup journal cache --- hit -> unlock & goto update - up_read(nat_tree_lock) update: - down_write(nat_tree_lock) - cache_nat_entry - lookup nat cache --- nohit -> update - up_write(nat_tree_lock) Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Try to use block plug in more place as below to let process cache bios as much as possbile, in order to reduce lock overhead of queue in IO scheduler. 1) sync_meta_pages 2) ra_meta_pages 3) f2fs_balance_fs_bg Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Introduce a new structure f2fs_journal to wrap journal info in struct f2fs_summary_block for readability. struct f2fs_journal { union { __le16 n_nats; __le16 n_sits; }; union { struct nat_journal nat_j; struct sit_journal sit_j; struct f2fs_extra_info info; }; } __packed; struct f2fs_summary_block { struct f2fs_summary entries[ENTRIES_IN_SUM]; struct f2fs_journal journal; struct summary_footer footer; } __packed; Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch adopts f2fs with codes of ext4, it removes unneeded memory allocation in creating/accessing path of symlink. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch syncs f2fs with commit abdd438b ("ext4 crypto: handle unexpected lack of encryption keys") from ext4. Fix up attempts by users to try to write to a file when they don't have access to the encryption key. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch syncs f2fs with commit 6bc445e0 ("ext4 crypto: make sure the encryption info is initialized on opendir(2)") from ext4. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Split drop_inmem_pages from commit_inmem_pages for code readability, and prepare for the following modification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes to eliminate garbage name lengths in dentries in order to provide correct answers of readdir. For example, if a valid dentry consists of: bitmap : 1 1 1 1 len : 32 0 x 0, readdir can start with second bit_pos having len = 0. Or, it can start with third bit_pos having garbage. In both of cases, we should avoid to try filling dentries. So, this patch not only removes any garbage length, but also avoid entering zero length case in readdir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes wrong adoption on fname padding. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch is to fix misused error number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adopts: ext4 crypto: add missing locking for keyring_key access Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adopts: ext4 crypto: check for too-short encrypted file names An encrypted file name should never be shorter than an 16 bytes, the AES block size. The 3.10 crypto layer will oops and crash the kernel if ciphertext shorter than the block size is passed to it. Fortunately, in modern kernels the crypto layer will not crash the kernel in this scenario, but nevertheless, it represents a corrupted directory, and we should detect it and mark the file system as corrupted so that e2fsck can fix this. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adopts: ext4 crypto: ext4_page_crypto() doesn't need a encryption context Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adopts: ext4 crypto: fix spelling typo in comment Signed-off-by: Laurent Navet <laurent.navet@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch adopts: ext4 crypto: replace some BUG_ON()'s with error checks Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
When finsert is doing with dirting pages, we should increase i_size right away. Otherwise, the moved page is able to be dropped by the following filemap_write_and_wait_range before updating i_size. Especially, it can be done by if ((page->index >= end_index + 1) || !offset) goto out; in f2fs_write_data_page. This should resolve the below xfstests/091 failure reported by Dave. $ diff -u tests/generic/091.out /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad --- tests/generic/091.out 2014-01-20 16:57:33.000000000 +1100 +++ /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad 2016-02-08 15:21:02.701375087 +1100 @@ -1,7 +1,18 @@ QA output created by 091 fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W +mapped writes DISABLED +skipping insert range behind EOF +skipping insert range behind EOF +truncating to largest ever: 0x11e00 +dowrite: write: Invalid argument +LOG DUMP (7 total operations): +1( 1 mod 256): SKIPPED (no operation) +2( 2 mod 256): SKIPPED (no operation) +3( 3 mod 256): FALLOC 0x2e0f2 thru 0x3134a (0x3258 bytes) PAST_EOF +4( 4 mod 256): SKIPPED (no operation) +5( 5 mod 256): SKIPPED (no operation) +6( 6 mod 256): TRUNCATE UP from 0x0 to 0x11e00 +7( 7 mod 256): WRITE 0x73400 thru 0x79fff (0x6c00 bytes) HOLE +Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops +Correct content saved for comparison +(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch preallocates data blocks for buffered aio writes. With this patch, we can avoid redundant locking and unlocking of node pages given consecutive aio request. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Yunlei He authored
fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes confilct on page->private value between f2fs_trace_pid and atomic page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
Sometimes, if cp_error is set, there remains under-writeback pages, resulting in kernel hang in put_super. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
Likewise f2fs_write_cache_pages, let's do for node and meta pages too. Especially, for node blocks, we should do this before marking its fsync and dentry flags. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Sheng Yong authored
Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When seeking data in ->llseek, if we encounter a big hole which covers several dnode pages, we will try to seek data from index of page which is the first page of next dnode page, at most we could skip searching (ADDRS_PER_BLOCK - 1) pages. However it's still not efficient, because if our indirect/double-indirect pointer are NULL, there are no dnode page locate in the tree indirect/ double-indirect pointer point to, it's not necessary to search the whole region. This patch introduces get_next_page_offset to calculate next page offset based on current searching level and max searching level returned from get_dnode_of_data, with this, we could skip searching the entire area indirect or double-indirect node block is not exist. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch uses existing function f2fs_map_block to simplify implementation of __allocate_data_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In f2fs_map_blocks, we use duplicated codes to handle first block mapping and the following blocks mapping, it's unnecessary. This patch simplifies f2fs_map_blocks to avoid using copied codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Shuoran Liu authored
This patch introduces lifetime IO write statistics exposed to the sysfs interface. The write IO amount is obtained from block layer, accumulated in the file system and stored in the hot node summary of checkpoint. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> [Jaegeuk Kim: add sysfs documentation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
It needs to give a chance to be rescheduled while shrinking slab entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-