- 23 Feb, 2016 35 commits
-
-
Jaegeuk Kim authored
This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Yunlei He authored
fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch fixes confilct on page->private value between f2fs_trace_pid and atomic page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
Sometimes, if cp_error is set, there remains under-writeback pages, resulting in kernel hang in put_super. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
Likewise f2fs_write_cache_pages, let's do for node and meta pages too. Especially, for node blocks, we should do this before marking its fsync and dentry flags. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Sheng Yong authored
Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When seeking data in ->llseek, if we encounter a big hole which covers several dnode pages, we will try to seek data from index of page which is the first page of next dnode page, at most we could skip searching (ADDRS_PER_BLOCK - 1) pages. However it's still not efficient, because if our indirect/double-indirect pointer are NULL, there are no dnode page locate in the tree indirect/ double-indirect pointer point to, it's not necessary to search the whole region. This patch introduces get_next_page_offset to calculate next page offset based on current searching level and max searching level returned from get_dnode_of_data, with this, we could skip searching the entire area indirect or double-indirect node block is not exist. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch uses existing function f2fs_map_block to simplify implementation of __allocate_data_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
In f2fs_map_blocks, we use duplicated codes to handle first block mapping and the following blocks mapping, it's unnecessary. This patch simplifies f2fs_map_blocks to avoid using copied codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Shuoran Liu authored
This patch introduces lifetime IO write statistics exposed to the sysfs interface. The write IO amount is obtained from block layer, accumulated in the file system and stored in the hot node summary of checkpoint. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> [Jaegeuk Kim: add sysfs documentation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
It needs to give a chance to be rescheduled while shrinking slab entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Hou Pengyang authored
On the worst case, we need to scan the whole radix tree and every rb-tree to free the victimed extent_nodes when shrinking. Pengyang initially introduced a victim_list to record the victimed extent_nodes, and free these extent_nodes by just scanning a list. Later, Chao Yu enhances the original patch to improve memory footprint by removing victim list. The policy of lru list shrinking becomes: 1) lock lru list's lock 2) trylock extent tree's lock 3) remove extent node from lru list 4) unlock lru list's lock 5) do shrink 6) repeat 1) to 5) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If en has empty list pointer, it will be freed sooner, so we don't need to set cached_en with it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch moves extent_node list operations to be handled together with its rbtree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Hou Pengyang authored
There are three steps to free an extent node: 1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free In path f2fs_destroy_extent_tree, 1->2->3 to free a node, But in path f2fs_update_extent_tree_range, it is 2->1->3. This patch makes all the order to be: 1->2->3 It makes sense, since in the next patch, we import a victim list in the path shrink_extent_tree, we could check if the extent_node is in the victim list by checking the list_empty(). So it is necessary to put 1) first. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Fan Li authored
variable nsearched in get_victim_by_default() indicates the number of dirty segments we already checked. There are 2 problems about the way it updates: 1. When p.ofs_unit is greater than 1, the victim we find consists of multiple segments, possibly more than 1 dirty segment. But nsearched always increases by 1. 2. If segments have been found but not been chosen, nsearched won't increase. So even we have checked all dirty segments, nsearched may still less than p.max_search. All these problems could cause unnecessary search after all dirty segments have already been checked. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Yunlei He authored
no need to wait inline file page writeback for no one use it, so this patch delete unnecessary wait. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
If we configure section consist of multiple segments, foreground GC will do the garbage collection with following approach: for each segment in victim section blk_start_plug for each valid block in segment write out by OPU method submit bio cache <--- blk_finish_plug <--- There are two issue: 1) for most of the time, 'submit bio cache' will break the merging in current bio buffer from writes of next segments, making a smaller bio submitting. 2) block plug only cover IO submitting in one segment, which reduce opportunity of merging IOs in plug with multiple segments. So refactor the code as below structure to strive for biggest opportunity of merging IOs: blk_start_plug for each segment in victim section for each valid block in segment write out by OPU method submit bio cache blk_finish_plug Test method: 1. mkfs.f2fs -s 8 /dev/sdX 2. touch 32 files 3. write 2M data into each file 4. punch 1.5M data from offset 0 for each file 5. trigger foreground gc through ioctl Before patch, there are totoally 40 bios submitted. f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768 ----repeat for 8 times After patch, there are totally 35 bios submitted. f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880 ----repeat 34 times f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If end_io gets an error, we don't need to set the page as dirty, since we already set f2fs_stop_checkpoint which will not flush any data. This will resolve the following warning. ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 4.4.0+ #9 Tainted: G O ------------------------------------------------------ xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs] and this task is already holding: (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490 which would create a new lock dependency: (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...} Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
In write_begin, if there is an inline_data, f2fs loads it into 0'th data page. Since it's the read path, we don't need to sync its inode page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
In write_end, we don't need to sync inode page at every time. Instead, we can expect f2fs_write_inode will update later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
We should consider data block allocation to trigger f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
The scenario is: 1. create lots of node blocks 2. sync 3. write lots of inline_data -> got panic due to no free space In that case, we should flush node blocks when writing inline_data in #3, and trigger gc as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
If there are many writepages calls by multiple threads in background, we don't need to serialize to merge all the bios, since it's background. In such the case, it'd better to run writepages concurrently. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Jaegeuk Kim authored
This patch removes needless condition variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
get_new_segment starts from current segment position, tries to search a free segment among its right neighbors locate in same section. But previously our search area was set as [current segment, max segment], which means we have to search to more bits in free_segmap bitmap for some worse cases. So here we correct the search area to [current segment, last segment in section] to avoid unnecessary searching. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold of dirty nat entries, if current ratio exceeds configured threshold, checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
When testing f2fs with xfstest, generic/251 is stuck for long time, the case uses below serials to obtain fresh released space in device, in order to prepare for following fstrim test. 1. rm -rf /mnt/dir 2. mkdir /mnt/dir/ 3. cp -axT `pwd`/ /mnt/dir/ 4. goto 1 During preparing step, all nat entries will be cached in nat cache, most of them are dirty entries with invalid blkaddr, which means nodes related to these entries have been truncated, and they could be reused after the dirty entries been checkpointed. However, there was no checkpoint been triggered, so nid allocators (e.g. mkdir, creat) will run into long journey of iterating all NAT pages, looking for free nids in alloc_nid->build_free_nids. Here, in f2fs_balance_fs_bg we give another chance to do checkpoint to flush nat entries for reusing them in free nid cache when dirty entry count exceeds 10% of max count. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-
- 22 Feb, 2016 5 commits
-
-
Linus Torvalds authored
Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more small fixes. One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the stack made by the stack tracer. As the stack tracer scans the entire kernel stack, KASAN triggers seeing it as a "stack out of bounds" error. As the scan is looking at the contents of the stack from parent functions. The NOCHECK() tells KASAN that this is done on purpose, and is not some kind of stack overflow. The second fix is to the ftrace selftests, to retrieve the PID of executed commands from the shell with '$!' and not by parsing 'jobs'" * tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing, kasan: Silence Kasan warning in check_stack of stack_tracer ftracetest: Fix instance test to use proper shell command for pids
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen bug fixes from David Vrabel: - Two scsiback fixes (resource leak and spurious warning). - Fix DMA mapping of compound pages on arm/arm64. - Fix some pciback regressions in MSI-X handling. - Fix a pcifront crash due to some uninitialize state. * tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted. xen/pcifront: Report the errors better. xen/pciback: Save the number of MSI-X entries to be copied later. xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY xen: fix potential integer overflow in queue_reply xen/arm: correctly handle DMA mapping of compound pages xen/scsiback: avoid warnings when adding multiple LUNs to a domain xen/scsiback: correct frontend counting
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Looks like a lot, but mostly driver fixes scattered all over as usual. Of note: 1) Add conditional sched in nf conntrack in cleanup to avoid NMI watchdogs. From Florian Westphal. 2) Fix deadlock in nfnetlink cttimeout, also from Floarian. 3) Fix handling of slaves in bonding ARP monitor validation, from Jay Vosburgh. 4) Callers of ip_cmsg_send() are responsible for freeing IP options, some were not doing so. Fix from Eric Dumazet. 5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT. 6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot. 7) bcm7xxx PHY driver bug fixes from Florian Fainelli. 8) Avoid unaligned accesses to protocol headers wrt. GRE, from Alexander Duyck. 9) SKB leaks and other problems in arc_emac driver, from Alexander Kochetkov. 10) tcp_v4_inbound_md5_hash() releases listener socket instead of request socket on error path, oops. Fix from Eric Dumazet. 11) Missing socket release in pppoe_rcv_core() that seems to have existed basically forever. From Guillaume Nault. 12) Missing slave_dev unregister in dsa_slave_create() error path, from Florian Fainelli. 13) crypto_alloc_hash() never returns NULL, fix return value check in __tcp_alloc_md5sig_pool. From Insu Yun. 14) Properly expire exception route entries in ipv4, from Xin Long. 15) Fix races in tcp/dccp listener socket dismantle, from Eric Dumazet. 16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not legal. These drivers modify the SKB on transmit. From Jiri Benc. 17) Fix regression in the initialziation of netdev->tx_queue_len. From Phil Sutter. 18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun. 19) SCTP port hash sizing does not properly ensure that table is a power of two in size. From Neil Horman. 20) Fix initializing of software copy of MAC address in fmvj18x_cs driver, from Ken Kawasaki" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits) bnx2x: Fix 84833 phy command handler bnx2x: Fix led setting for 84858 phy. bnx2x: Correct 84858 PHY fw version bnx2x: Fix 84833 RX CRC bnx2x: Fix link-forcing for KR2 net: ethernet: davicom: fix devicetree irq resource fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address Driver: Vmxnet3: Update Rx ring 2 max size net: netcp: rework the code for get/set sw_data in dma desc soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data net: ti: netcp: restore get/set_pad_info() functionality MAINTAINERS: Drop myself as xen netback maintainer sctp: Fix port hash table size computation can: ems_usb: Fix possible tx overflow Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb net: bcmgenet: Fix internal PHY link state af_unix: Don't use continue to re-execute unix_stream_read_generic loop unix_diag: fix incorrect sign extension in unix_lookup_by_ino bnxt_en: Failure to update PHY is not fatal condition. bnxt_en: Remove unnecessary call to update PHY settings. ...
-
Linus Torvalds authored
Merge tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Two fixes headed for stable: - Remove an unnecessary speed_index lookup for thermal hook in the gpio-fan driver. The unnecessary speed lookup can hog the system. - Handle negative conversion values correctly in the ads1015 driver" * tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook hwmon: (ads1015) Handle negative conversion values correctly
-
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds authored
Pull rdma fixes from Doug Ledford: "One ocrdma fix: - The new CQ API support was added to ocrdma, but they got the arming logic wrong, so without this, transfers eventually fail when they fail to arm the interrupt properly under load Two related fixes for mlx4: - When we added the 64bit extended counters support to the core IB code, they forgot to update the RoCE side of the mlx4 driver (the IB side they properly updated). I debated whether or not to include these patches as they could be considered feature enablement patches, but the existing code will blindy copy the 32bit counters, whether any counters were requested at all (a bug). These two patches make it (a) check to see that counters were requested and (b) copy the right counters (the 64bit support is new, the 32bit is not). For that reason I went ahead and took them" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx4: Add support for the port info class for RoCE ports IB/mlx4: Add support for extended counters over RoCE ports RDMA/ocrdma: Fix arm logic to align with new cq API
-