Commit 6abaa83c authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've addressed some performance issues such as lock
  contention, misbehaving compress_cache, allowing extent_cache for
  compressed files, and new sysfs to adjust ra_size for fadvise.

  In order to diagnose the performance issues quickly, we also added an
  iostat which shows the IO latencies periodically.

  On the stability side, we've found two memory leakage cases in the
  error path in compression flow. And, we've also fixed various corner
  cases in fiemap, quota, checkpoint=disable, zstd, and so on.

  Enhancements:
   - avoid long checkpoint latency by releasing nat_tree_lock
   - collect and show iostats periodically
   - support extent_cache for compressed files
   - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
   - report f2fs GC status via sysfs
   - add discard_unit=%s in mount option to handle zoned device

  Bug fixes:
   - fix two memory leakages when an error happens in the compressed IO flow
   - fix commpress_cache to get the right LBA
   - fix fiemap to deal with compressed case correctly
   - fix wrong EIO returns due to SBI_NEED_FSCK
   - fix missing writes when enabling checkpoint back
   - fix quota deadlock
   - fix zstd level mount option

  In addition to the above major updates, we've cleaned up several code
  paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
  check, and typos"

* tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: should put a page beyond EOF when preparing a write
  f2fs: deallocate compressed pages when error happens
  f2fs: enable realtime discard iff device supports discard
  f2fs: guarantee to write dirty data when enabling checkpoint back
  f2fs: fix to unmap pages from userspace process in punch_hole()
  f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
  f2fs: fix to account missing .skipped_gc_rwsem
  f2fs: adjust unlock order for cleanup
  f2fs: Don't create discard thread when device doesn't support realtime discard
  f2fs: rebuild nat_bits during umount
  f2fs: introduce periodic iostat io latency traces
  f2fs: separate out iostat feature
  f2fs: compress: do sanity check on cluster
  f2fs: fix description about main_blkaddr node
  f2fs: convert S_IRUGO to 0444
  f2fs: fix to keep compatibility of fault injection interface
  f2fs: support fault injection for f2fs_kmem_cache_alloc()
  f2fs: compress: allow write compress released file after truncate to zero
  f2fs: correct comment in segment.h
  f2fs: improve sbi status info in debugfs/f2fs/status
  ...
parents 0961f0c0 9605f75c
......@@ -41,8 +41,7 @@ Description: This parameter controls the number of prefree segments to be
What: /sys/fs/f2fs/<disk>/main_blkaddr
Date: November 2019
Contact: "Ramon Pantin" <pantin@google.com>
Description:
Shows first block address of MAIN area.
Description: Shows first block address of MAIN area.
What: /sys/fs/f2fs/<disk>/ipu_policy
Date: November 2013
......@@ -493,3 +492,23 @@ Contact: "Chao Yu" <yuchao0@huawei.com>
Description: When ATGC is on, it controls age threshold to bypass GCing young
candidates whose age is not beyond the threshold, by default it was
initialized as 604800 seconds (equals to 7 days).
What: /sys/fs/f2fs/<disk>/gc_reclaimed_segments
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Show how many segments have been reclaimed by GC during a specific
GC mode (0: GC normal, 1: GC idle CB, 2: GC idle greedy,
3: GC idle AT, 4: GC urgent high, 5: GC urgent low)
You can re-initialize this value to "0".
What: /sys/fs/f2fs/<disk>/gc_segment_mode
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: You can control for which gc mode the "gc_reclaimed_segments" node shows.
Refer to the description of the modes in "gc_reclaimed_segments".
What: /sys/fs/f2fs/<disk>/seq_file_ra_mul
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: You can control the multiplier value of bdi device readahead window size
between 2 (default) and 256 for POSIX_FADV_SEQUENTIAL advise option.
......@@ -185,6 +185,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_KVMALLOC 0x000000002
FAULT_PAGE_ALLOC 0x000000004
FAULT_PAGE_GET 0x000000008
FAULT_ALLOC_BIO 0x000000010 (obsolete)
FAULT_ALLOC_NID 0x000000020
FAULT_ORPHAN 0x000000040
FAULT_BLOCK 0x000000080
......@@ -195,6 +196,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_CHECKPOINT 0x000001000
FAULT_DISCARD 0x000002000
FAULT_WRITE_IO 0x000004000
FAULT_SLAB_ALLOC 0x000008000
=================== ===========
mode=%s Control block allocation mode which supports "adaptive"
and "lfs". In "lfs" mode, there should be no random
......@@ -312,6 +314,14 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
Documentation/block/inline-encryption.rst.
atgc Enable age-threshold garbage collection, it provides high
effectiveness and efficiency on background GC.
discard_unit=%s Control discard unit, the argument can be "block", "segment"
and "section", issued discard command's offset/size will be
aligned to the unit, by default, "discard_unit=block" is set,
so that small discard functionality is enabled.
For blkzoned device, "discard_unit=section" will be set by
default, it is helpful for large sized SMR or ZNS devices to
reduce memory cost by getting rid of fs metadata supports small
discard.
======================== ============================================================
Debugfs Entries
......@@ -857,8 +867,11 @@ Compression implementation
directly in order to guarantee potential data updates later to the space.
Instead, the main goal is to reduce data writes to flash disk as much as
possible, resulting in extending disk life time as well as relaxing IO
congestion. Alternatively, we've added ioctl interface to reclaim compressed
space and show it to user after putting the immutable bit.
congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS)
interface to reclaim compressed space and show it to user after putting the
immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping
on the file, until reserving compressed space via
ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero.
Compress metadata layout::
......
......@@ -105,6 +105,13 @@ config F2FS_FS_LZO
help
Support LZO compress algorithm, if unsure, say Y.
config F2FS_FS_LZORLE
bool "LZO-RLE compression support"
depends on F2FS_FS_LZO
default y
help
Support LZO-RLE compress algorithm, if unsure, say Y.
config F2FS_FS_LZ4
bool "LZ4 compression support"
depends on F2FS_FS_COMPRESSION
......@@ -114,7 +121,6 @@ config F2FS_FS_LZ4
config F2FS_FS_LZ4HC
bool "LZ4HC compression support"
depends on F2FS_FS_COMPRESSION
depends on F2FS_FS_LZ4
default y
help
......@@ -128,10 +134,11 @@ config F2FS_FS_ZSTD
help
Support ZSTD compress algorithm, if unsure, say Y.
config F2FS_FS_LZORLE
bool "LZO-RLE compression support"
depends on F2FS_FS_COMPRESSION
depends on F2FS_FS_LZO
config F2FS_IOSTAT
bool "F2FS IO statistics information"
depends on F2FS_FS
default y
help
Support LZO-RLE compress algorithm, if unsure, say Y.
Support getting IO statistics through sysfs and printing out periodic
IO statistics tracepoint events. You have to turn on "iostat_enable"
sysfs node to enable this feature.
......@@ -9,3 +9,4 @@ f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
f2fs-$(CONFIG_FS_VERITY) += verity.o
f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
f2fs-$(CONFIG_F2FS_IOSTAT) += iostat.o
......@@ -18,6 +18,7 @@
#include "f2fs.h"
#include "node.h"
#include "segment.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
......@@ -465,16 +466,29 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
unsigned int devidx, int type)
{
struct inode_management *im = &sbi->im[type];
struct ino_entry *e, *tmp;
struct ino_entry *e = NULL, *new = NULL;
tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS);
if (type == FLUSH_INO) {
rcu_read_lock();
e = radix_tree_lookup(&im->ino_root, ino);
rcu_read_unlock();
}
retry:
if (!e)
new = f2fs_kmem_cache_alloc(ino_entry_slab,
GFP_NOFS, true, NULL);
radix_tree_preload(GFP_NOFS | __GFP_NOFAIL);
spin_lock(&im->ino_lock);
e = radix_tree_lookup(&im->ino_root, ino);
if (!e) {
e = tmp;
if (!new) {
spin_unlock(&im->ino_lock);
goto retry;
}
e = new;
if (unlikely(radix_tree_insert(&im->ino_root, ino, e)))
f2fs_bug_on(sbi, 1);
......@@ -492,8 +506,8 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
spin_unlock(&im->ino_lock);
radix_tree_preload_end();
if (e != tmp)
kmem_cache_free(ino_entry_slab, tmp);
if (new && e != new)
kmem_cache_free(ino_entry_slab, new);
}
static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
......@@ -1289,12 +1303,20 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned long flags;
spin_lock_irqsave(&sbi->cp_lock, flags);
if (cpc->reason & CP_UMOUNT) {
if (le32_to_cpu(ckpt->cp_pack_total_block_count) >
sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks) {
clear_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Disable nat_bits due to no space");
} else if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG) &&
f2fs_nat_bitmap_enabled(sbi)) {
f2fs_enable_nat_bits(sbi);
set_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Rebuild and enable nat_bits");
}
}
if ((cpc->reason & CP_UMOUNT) &&
le32_to_cpu(ckpt->cp_pack_total_block_count) >
sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks)
disable_nat_bits(sbi, false);
spin_lock_irqsave(&sbi->cp_lock, flags);
if (cpc->reason & CP_TRIMMED)
__set_ckpt_flags(ckpt, CP_TRIMMED_FLAG);
......@@ -1480,7 +1502,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
start_blk = __start_cp_next_addr(sbi);
/* write nat bits */
if (enabled_nat_bits(sbi, cpc)) {
if ((cpc->reason & CP_UMOUNT) &&
is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG)) {
__u64 cp_ver = cur_cp_version(ckpt);
block_t blk;
......@@ -1639,8 +1662,11 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
/* write cached NAT/SIT entries to NAT/SIT area */
err = f2fs_flush_nat_entries(sbi, cpc);
if (err)
if (err) {
f2fs_err(sbi, "f2fs_flush_nat_entries failed err:%d, stop checkpoint", err);
f2fs_bug_on(sbi, !f2fs_cp_error(sbi));
goto stop;
}
f2fs_flush_sit_entries(sbi, cpc);
......@@ -1648,10 +1674,13 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_save_inmem_curseg(sbi);
err = do_checkpoint(sbi, cpc);
if (err)
if (err) {
f2fs_err(sbi, "do_checkpoint failed err:%d, stop checkpoint", err);
f2fs_bug_on(sbi, !f2fs_cp_error(sbi));
f2fs_release_discard_addrs(sbi);
else
} else {
f2fs_clear_prefree_segments(sbi, cpc);
}
f2fs_restore_inmem_curseg(sbi);
stop:
......
......@@ -28,7 +28,8 @@ static void *page_array_alloc(struct inode *inode, int nr)
unsigned int size = sizeof(struct page *) * nr;
if (likely(size <= sbi->page_array_slab_size))
return kmem_cache_zalloc(sbi->page_array_slab, GFP_NOFS);
return f2fs_kmem_cache_alloc(sbi->page_array_slab,
GFP_F2FS_ZERO, false, F2FS_I_SB(inode));
return f2fs_kzalloc(sbi, size, GFP_NOFS);
}
......@@ -898,6 +899,54 @@ static bool cluster_has_invalid_data(struct compress_ctx *cc)
return false;
}
bool f2fs_sanity_check_cluster(struct dnode_of_data *dn)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
unsigned int cluster_size = F2FS_I(dn->inode)->i_cluster_size;
bool compressed = dn->data_blkaddr == COMPRESS_ADDR;
int cluster_end = 0;
int i;
char *reason = "";
if (!compressed)
return false;
/* [..., COMPR_ADDR, ...] */
if (dn->ofs_in_node % cluster_size) {
reason = "[*|C|*|*]";
goto out;
}
for (i = 1; i < cluster_size; i++) {
block_t blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
/* [COMPR_ADDR, ..., COMPR_ADDR] */
if (blkaddr == COMPRESS_ADDR) {
reason = "[C|*|C|*]";
goto out;
}
if (compressed) {
if (!__is_valid_data_blkaddr(blkaddr)) {
if (!cluster_end)
cluster_end = i;
continue;
}
/* [COMPR_ADDR, NULL_ADDR or NEW_ADDR, valid_blkaddr] */
if (cluster_end) {
reason = "[C|N|N|V]";
goto out;
}
}
}
return false;
out:
f2fs_warn(sbi, "access invalid cluster, ino:%lu, nid:%u, ofs_in_node:%u, reason:%s",
dn->inode->i_ino, dn->nid, dn->ofs_in_node, reason);
set_sbi_flag(sbi, SBI_NEED_FSCK);
return true;
}
static int __f2fs_cluster_blocks(struct inode *inode,
unsigned int cluster_idx, bool compr)
{
......@@ -915,6 +964,11 @@ static int __f2fs_cluster_blocks(struct inode *inode,
goto fail;
}
if (f2fs_sanity_check_cluster(&dn)) {
ret = -EFSCORRUPTED;
goto fail;
}
if (dn.data_blkaddr == COMPRESS_ADDR) {
int i;
......@@ -1228,7 +1282,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
fio.version = ni.version;
cic = kmem_cache_zalloc(cic_entry_slab, GFP_NOFS);
cic = f2fs_kmem_cache_alloc(cic_entry_slab, GFP_F2FS_ZERO, false, sbi);
if (!cic)
goto out_put_dnode;
......@@ -1340,12 +1394,6 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
for (--i; i >= 0; i--)
fscrypt_finalize_bounce_page(&cc->cpages[i]);
for (i = 0; i < cc->nr_cpages; i++) {
if (!cc->cpages[i])
continue;
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
out_put_cic:
kmem_cache_free(cic_entry_slab, cic);
out_put_dnode:
......@@ -1356,6 +1404,12 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
else
f2fs_unlock_op(sbi);
out_free:
for (i = 0; i < cc->nr_cpages; i++) {
if (!cc->cpages[i])
continue;
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
page_array_free(cc->inode, cc->cpages, cc->nr_cpages);
cc->cpages = NULL;
return -EAGAIN;
......@@ -1506,7 +1560,8 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
pgoff_t start_idx = start_idx_of_cluster(cc);
int i;
dic = kmem_cache_zalloc(dic_entry_slab, GFP_NOFS);
dic = f2fs_kmem_cache_alloc(dic_entry_slab, GFP_F2FS_ZERO,
false, F2FS_I_SB(cc->inode));
if (!dic)
return ERR_PTR(-ENOMEM);
......@@ -1666,6 +1721,30 @@ void f2fs_put_page_dic(struct page *page)
f2fs_put_dic(dic);
}
/*
* check whether cluster blocks are contiguous, and add extent cache entry
* only if cluster blocks are logically and physically contiguous.
*/
unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn)
{
bool compressed = f2fs_data_blkaddr(dn) == COMPRESS_ADDR;
int i = compressed ? 1 : 0;
block_t first_blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
for (i += 1; i < F2FS_I(dn->inode)->i_cluster_size; i++) {
block_t blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
if (!__is_valid_data_blkaddr(blkaddr))
break;
if (first_blkaddr + i - (compressed ? 1 : 0) != blkaddr)
return 0;
}
return compressed ? i - 1 : i;
}
const struct address_space_operations f2fs_compress_aops = {
.releasepage = f2fs_release_page,
.invalidatepage = f2fs_invalidate_page,
......
This diff is collapsed.
......@@ -323,11 +323,27 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
#endif
}
static char *s_flag[] = {
[SBI_IS_DIRTY] = " fs_dirty",
[SBI_IS_CLOSE] = " closing",
[SBI_NEED_FSCK] = " need_fsck",
[SBI_POR_DOING] = " recovering",
[SBI_NEED_SB_WRITE] = " sb_dirty",
[SBI_NEED_CP] = " need_cp",
[SBI_IS_SHUTDOWN] = " shutdown",
[SBI_IS_RECOVERED] = " recovered",
[SBI_CP_DISABLED] = " cp_disabled",
[SBI_CP_DISABLED_QUICK] = " cp_disabled_quick",
[SBI_QUOTA_NEED_FLUSH] = " quota_need_flush",
[SBI_QUOTA_SKIP_FLUSH] = " quota_skip_flush",
[SBI_QUOTA_NEED_REPAIR] = " quota_need_repair",
[SBI_IS_RESIZEFS] = " resizefs",
};
static int stat_show(struct seq_file *s, void *v)
{
struct f2fs_stat_info *si;
int i = 0;
int j;
int i = 0, j = 0;
mutex_lock(&f2fs_stat_mutex);
list_for_each_entry(si, &f2fs_stat_list, stat_list) {
......@@ -337,7 +353,13 @@ static int stat_show(struct seq_file *s, void *v)
si->sbi->sb->s_bdev, i++,
f2fs_readonly(si->sbi->sb) ? "RO": "RW",
is_set_ckpt_flags(si->sbi, CP_DISABLED_FLAG) ?
"Disabled": (f2fs_cp_error(si->sbi) ? "Error": "Good"));
"Disabled" : (f2fs_cp_error(si->sbi) ? "Error" : "Good"));
if (si->sbi->s_flag) {
seq_puts(s, "[SBI:");
for_each_set_bit(j, &si->sbi->s_flag, 32)
seq_puts(s, s_flag[j]);
seq_puts(s, "]\n");
}
seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ",
si->sit_area_segs, si->nat_area_segs);
seq_printf(s, "[SSA: %d] [MAIN: %d",
......@@ -450,6 +472,15 @@ static int stat_show(struct seq_file *s, void *v)
si->data_segs, si->bg_data_segs);
seq_printf(s, " - node segments : %d (%d)\n",
si->node_segs, si->bg_node_segs);
seq_printf(s, " - Reclaimed segs : Normal (%d), Idle CB (%d), "
"Idle Greedy (%d), Idle AT (%d), "
"Urgent High (%d), Urgent Low (%d)\n",
si->sbi->gc_reclaimed_segs[GC_NORMAL],
si->sbi->gc_reclaimed_segs[GC_IDLE_CB],
si->sbi->gc_reclaimed_segs[GC_IDLE_GREEDY],
si->sbi->gc_reclaimed_segs[GC_IDLE_AT],
si->sbi->gc_reclaimed_segs[GC_URGENT_HIGH],
si->sbi->gc_reclaimed_segs[GC_URGENT_LOW]);
seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks,
si->bg_data_blks + si->bg_node_blks);
seq_printf(s, " - data blocks : %d (%d)\n", si->data_blks,
......@@ -611,7 +642,7 @@ void __init f2fs_create_root_stats(void)
#ifdef CONFIG_DEBUG_FS
f2fs_debugfs_root = debugfs_create_dir("f2fs", NULL);
debugfs_create_file("status", S_IRUGO, f2fs_debugfs_root, NULL,
debugfs_create_file("status", 0444, f2fs_debugfs_root, NULL,
&stat_fops);
#endif
}
......
......@@ -83,8 +83,8 @@ int f2fs_init_casefolded_name(const struct inode *dir,
struct super_block *sb = dir->i_sb;
if (IS_CASEFOLDED(dir)) {
fname->cf_name.name = kmem_cache_alloc(f2fs_cf_name_slab,
GFP_NOFS);
fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
GFP_NOFS, false, F2FS_SB(sb));
if (!fname->cf_name.name)
return -ENOMEM;
fname->cf_name.len = utf8_casefold(sb->s_encoding,
......@@ -1000,6 +1000,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode);
struct blk_plug plug;
bool readdir_ra = sbi->readdir_ra == 1;
bool found_valid_dirent = false;
int err = 0;
bit_pos = ((unsigned long)ctx->pos % d->max);
......@@ -1014,13 +1015,15 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
de = &d->dentry[bit_pos];
if (de->name_len == 0) {
if (found_valid_dirent || !bit_pos) {
printk_ratelimited(
"%sF2FS-fs (%s): invalid namelen(0), ino:%u, run fsck to fix.",
KERN_WARNING, sbi->sb->s_id,
le32_to_cpu(de->ino));
set_sbi_flag(sbi, SBI_NEED_FSCK);
}
bit_pos++;
ctx->pos = start_pos + bit_pos;
printk_ratelimited(
"%sF2FS-fs (%s): invalid namelen(0), ino:%u, run fsck to fix.",
KERN_WARNING, sbi->sb->s_id,
le32_to_cpu(de->ino));
set_sbi_flag(sbi, SBI_NEED_FSCK);
continue;
}
......@@ -1063,6 +1066,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
f2fs_ra_node_page(sbi, le32_to_cpu(de->ino));
ctx->pos = start_pos + bit_pos;
found_valid_dirent = true;
}
out:
if (readdir_ra)
......
......@@ -239,7 +239,7 @@ static struct extent_node *__attach_extent_node(struct f2fs_sb_info *sbi,
{
struct extent_node *en;
en = kmem_cache_alloc(extent_node_slab, GFP_ATOMIC);
en = f2fs_kmem_cache_alloc(extent_node_slab, GFP_ATOMIC, false, sbi);
if (!en)
return NULL;
......@@ -292,7 +292,8 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode)
mutex_lock(&sbi->extent_tree_lock);
et = radix_tree_lookup(&sbi->extent_tree_root, ino);
if (!et) {
et = f2fs_kmem_cache_alloc(extent_tree_slab, GFP_NOFS);
et = f2fs_kmem_cache_alloc(extent_tree_slab,
GFP_NOFS, true, NULL);
f2fs_radix_tree_insert(&sbi->extent_tree_root, ino, et);
memset(et, 0, sizeof(struct extent_tree));
et->ino = ino;
......@@ -661,6 +662,47 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
f2fs_mark_inode_dirty_sync(inode, true);
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
void f2fs_update_extent_tree_range_compressed(struct inode *inode,
pgoff_t fofs, block_t blkaddr, unsigned int llen,
unsigned int c_len)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct extent_tree *et = F2FS_I(inode)->extent_tree;
struct extent_node *en = NULL;
struct extent_node *prev_en = NULL, *next_en = NULL;
struct extent_info ei;
struct rb_node **insert_p = NULL, *insert_parent = NULL;
bool leftmost = false;
trace_f2fs_update_extent_tree_range(inode, fofs, blkaddr, llen);
/* it is safe here to check FI_NO_EXTENT w/o et->lock in ro image */
if (is_inode_flag_set(inode, FI_NO_EXTENT))
return;
write_lock(&et->lock);
en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root,
(struct rb_entry *)et->cached_en, fofs,
(struct rb_entry **)&prev_en,
(struct rb_entry **)&next_en,
&insert_p, &insert_parent, false,
&leftmost);
if (en)
goto unlock_out;
set_extent_info(&ei, fofs, blkaddr, llen);
ei.c_len = c_len;
if (!__try_merge_extent_node(sbi, et, &ei, prev_en, next_en))
__insert_extent_tree(sbi, et, &ei,
insert_p, insert_parent, leftmost);
unlock_out:
write_unlock(&et->lock);
}
#endif
unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
{
struct extent_tree *et, *next;
......
This diff is collapsed.
......@@ -23,6 +23,7 @@
#include <linux/nls.h>
#include <linux/sched/signal.h>
#include <linux/fileattr.h>
#include <linux/fadvise.h>
#include "f2fs.h"
#include "node.h"
......@@ -30,6 +31,7 @@
#include "xattr.h"
#include "acl.h"
#include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
#include <uapi/linux/f2fs.h>
......@@ -258,8 +260,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
};
unsigned int seq_id = 0;
if (unlikely(f2fs_readonly(inode->i_sb) ||
is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
if (unlikely(f2fs_readonly(inode->i_sb)))
return 0;
trace_f2fs_sync_file_enter(inode);
......@@ -273,7 +274,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
ret = file_write_and_wait_range(file, start, end);
clear_inode_flag(inode, FI_NEED_IPU);
if (ret) {
if (ret || is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret);
return ret;
}
......@@ -298,6 +299,18 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
f2fs_exist_written_data(sbi, ino, UPDATE_INO))
goto flush_out;
goto out;
} else {
/*
* for OPU case, during fsync(), node can be persisted before
* data when lower device doesn't support write barrier, result
* in data corruption after SPO.
* So for strict fsync mode, force to use atomic write sematics
* to keep write order in between data/node and last node to
* avoid potential data corruption.
*/
if (F2FS_OPTION(sbi).fsync_mode ==
FSYNC_MODE_STRICT && !atomic)
atomic = true;
}
go_write:
/*
......@@ -737,6 +750,14 @@ int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock)
return err;
#ifdef CONFIG_F2FS_FS_COMPRESSION
/*
* For compressed file, after release compress blocks, don't allow write
* direct, but we should allow write direct after truncate to zero.
*/
if (f2fs_compressed_file(inode) && !free_from
&& is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
clear_inode_flag(inode, FI_COMPRESS_RELEASED);
if (from != free_from) {
err = f2fs_truncate_partial_cluster(inode, from, lock);
if (err)
......@@ -1082,7 +1103,6 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len)
}
if (pg_start < pg_end) {
struct address_space *mapping = inode->i_mapping;
loff_t blk_start, blk_end;
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
......@@ -1092,16 +1112,15 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len)
blk_end = (loff_t)pg_end << PAGE_SHIFT;
down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_lock(mapping);
filemap_invalidate_lock(inode->i_mapping);
truncate_inode_pages_range(mapping, blk_start,
blk_end - 1);
truncate_pagecache_range(inode, blk_start, blk_end - 1);
f2fs_lock_op(sbi);
ret = f2fs_truncate_hole(inode, pg_start, pg_end);
f2fs_unlock_op(sbi);
filemap_invalidate_unlock(mapping);
filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
}
}
......@@ -3473,8 +3492,8 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg)
released_blocks += ret;
}
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
out:
inode_unlock(inode);
......@@ -3626,8 +3645,8 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg)
reserved_blocks += ret;
}
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
if (ret >= 0) {
clear_inode_flag(inode, FI_COMPRESS_RELEASED);
......@@ -4290,7 +4309,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
* back to buffered IO.
*/
if (!f2fs_force_buffered_io(inode, iocb, from) &&
allow_outplace_dio(inode, iocb, from))
f2fs_lfs_mode(F2FS_I_SB(inode)))
goto write;
}
preallocated = true;
......@@ -4330,6 +4349,34 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
return ret;
}
static int f2fs_file_fadvise(struct file *filp, loff_t offset, loff_t len,
int advice)
{
struct inode *inode;
struct address_space *mapping;
struct backing_dev_info *bdi;
if (advice == POSIX_FADV_SEQUENTIAL) {
inode = file_inode(filp);
if (S_ISFIFO(inode->i_mode))
return -ESPIPE;
mapping = filp->f_mapping;
if (!mapping || len < 0)
return -EINVAL;
bdi = inode_to_bdi(mapping->host);
filp->f_ra.ra_pages = bdi->ra_pages *
F2FS_I_SB(inode)->seq_file_ra_mul;
spin_lock(&filp->f_lock);
filp->f_mode &= ~FMODE_RANDOM;
spin_unlock(&filp->f_lock);
return 0;
}
return generic_fadvise(filp, offset, len, advice);
}
#ifdef CONFIG_COMPAT
struct compat_f2fs_gc_range {
u32 sync;
......@@ -4458,4 +4505,5 @@ const struct file_operations f2fs_file_operations = {
#endif
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
.fadvise = f2fs_file_fadvise,
};
......@@ -19,6 +19,7 @@
#include "node.h"
#include "segment.h"
#include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
static struct kmem_cache *victim_entry_slab;
......@@ -371,7 +372,8 @@ static struct victim_entry *attach_victim_entry(struct f2fs_sb_info *sbi,
struct atgc_management *am = &sbi->am;
struct victim_entry *ve;
ve = f2fs_kmem_cache_alloc(victim_entry_slab, GFP_NOFS);
ve = f2fs_kmem_cache_alloc(victim_entry_slab,
GFP_NOFS, true, NULL);
ve->mtime = mtime;
ve->segno = segno;
......@@ -849,7 +851,8 @@ static void add_gc_inode(struct gc_inode_list *gc_list, struct inode *inode)
iput(inode);
return;
}
new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS);
new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab,
GFP_NOFS, true, NULL);
new_ie->inode = inode;
f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie);
......@@ -1497,8 +1500,10 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
int err;
if (S_ISREG(inode->i_mode)) {
if (!down_write_trylock(&fi->i_gc_rwsem[READ]))
if (!down_write_trylock(&fi->i_gc_rwsem[READ])) {
sbi->skipped_gc_rwsem++;
continue;
}
if (!down_write_trylock(
&fi->i_gc_rwsem[WRITE])) {
sbi->skipped_gc_rwsem++;
......@@ -1646,6 +1651,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
force_migrate);
stat_inc_seg_count(sbi, type, gc_type);
sbi->gc_reclaimed_segs[sbi->gc_mode]++;
migrated++;
freed:
......@@ -1747,7 +1753,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
round++;
}
if (gc_type == FG_GC && seg_freed)
if (gc_type == FG_GC)
sbi->cur_victim_sec = NULL_SEGNO;
if (sync)
......
// SPDX-License-Identifier: GPL-2.0
/*
* f2fs iostat support
*
* Copyright 2021 Google LLC
* Author: Daeho Jeong <daehojeong@google.com>
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/seq_file.h>
#include "f2fs.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
#define NUM_PREALLOC_IOSTAT_CTXS 128
static struct kmem_cache *bio_iostat_ctx_cache;
static mempool_t *bio_iostat_ctx_pool;
int __maybe_unused iostat_info_seq_show(struct seq_file *seq, void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
time64_t now = ktime_get_real_seconds();
if (!sbi->iostat_enable)
return 0;
seq_printf(seq, "time: %-16llu\n", now);
/* print app write IOs */
seq_puts(seq, "[WRITE]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_IO]);
/* print fs write IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GC_DATA_IO]);
seq_printf(seq, "fs gc node: %-16llu\n",
sbi->rw_iostat[FS_GC_NODE_IO]);
seq_printf(seq, "fs cp data: %-16llu\n",
sbi->rw_iostat[FS_CP_DATA_IO]);
seq_printf(seq, "fs cp node: %-16llu\n",
sbi->rw_iostat[FS_CP_NODE_IO]);
seq_printf(seq, "fs cp meta: %-16llu\n",
sbi->rw_iostat[FS_CP_META_IO]);
/* print app read IOs */
seq_puts(seq, "[READ]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_READ_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_READ_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_READ_IO]);
/* print fs read IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_READ_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GDATA_READ_IO]);
seq_printf(seq, "fs compr_data: %-16llu\n",
sbi->rw_iostat[FS_CDATA_READ_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_READ_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_READ_IO]);
/* print other IOs */
seq_puts(seq, "[OTHER]\n");
seq_printf(seq, "fs discard: %-16llu\n",
sbi->rw_iostat[FS_DISCARD]);
return 0;
}
static inline void __record_iostat_latency(struct f2fs_sb_info *sbi)
{
int io, idx = 0;
unsigned int cnt;
struct f2fs_iostat_latency iostat_lat[MAX_IO_TYPE][NR_PAGE_TYPE];
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
spin_lock_irq(&sbi->iostat_lat_lock);
for (idx = 0; idx < MAX_IO_TYPE; idx++) {
for (io = 0; io < NR_PAGE_TYPE; io++) {
cnt = io_lat->bio_cnt[idx][io];
iostat_lat[idx][io].peak_lat =
jiffies_to_msecs(io_lat->peak_lat[idx][io]);
iostat_lat[idx][io].cnt = cnt;
iostat_lat[idx][io].avg_lat = cnt ?
jiffies_to_msecs(io_lat->sum_lat[idx][io]) / cnt : 0;
io_lat->sum_lat[idx][io] = 0;
io_lat->peak_lat[idx][io] = 0;
io_lat->bio_cnt[idx][io] = 0;
}
}
spin_unlock_irq(&sbi->iostat_lat_lock);
trace_f2fs_iostat_latency(sbi, iostat_lat);
}
static inline void f2fs_record_iostat(struct f2fs_sb_info *sbi)
{
unsigned long long iostat_diff[NR_IO_TYPE];
int i;
if (time_is_after_jiffies(sbi->iostat_next_period))
return;
/* Need double check under the lock */
spin_lock(&sbi->iostat_lock);
if (time_is_after_jiffies(sbi->iostat_next_period)) {
spin_unlock(&sbi->iostat_lock);
return;
}
sbi->iostat_next_period = jiffies +
msecs_to_jiffies(sbi->iostat_period_ms);
for (i = 0; i < NR_IO_TYPE; i++) {
iostat_diff[i] = sbi->rw_iostat[i] -
sbi->prev_rw_iostat[i];
sbi->prev_rw_iostat[i] = sbi->rw_iostat[i];
}
spin_unlock(&sbi->iostat_lock);
trace_f2fs_iostat(sbi, iostat_diff);
__record_iostat_latency(sbi);
}
void f2fs_reset_iostat(struct f2fs_sb_info *sbi)
{
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int i;
spin_lock(&sbi->iostat_lock);
for (i = 0; i < NR_IO_TYPE; i++) {
sbi->rw_iostat[i] = 0;
sbi->prev_rw_iostat[i] = 0;
}
spin_unlock(&sbi->iostat_lock);
spin_lock_irq(&sbi->iostat_lat_lock);
memset(io_lat, 0, sizeof(struct iostat_lat_info));
spin_unlock_irq(&sbi->iostat_lat_lock);
}
void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes)
{
if (!sbi->iostat_enable)
return;
spin_lock(&sbi->iostat_lock);
sbi->rw_iostat[type] += io_bytes;
if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
sbi->rw_iostat[APP_BUFFERED_IO] =
sbi->rw_iostat[APP_WRITE_IO] -
sbi->rw_iostat[APP_DIRECT_IO];
if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
sbi->rw_iostat[APP_BUFFERED_READ_IO] =
sbi->rw_iostat[APP_READ_IO] -
sbi->rw_iostat[APP_DIRECT_READ_IO];
spin_unlock(&sbi->iostat_lock);
f2fs_record_iostat(sbi);
}
static inline void __update_iostat_latency(struct bio_iostat_ctx *iostat_ctx,
int rw, bool is_sync)
{
unsigned long ts_diff;
unsigned int iotype = iostat_ctx->type;
unsigned long flags;
struct f2fs_sb_info *sbi = iostat_ctx->sbi;
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int idx;
if (!sbi->iostat_enable)
return;
ts_diff = jiffies - iostat_ctx->submit_ts;
if (iotype >= META_FLUSH)
iotype = META;
if (rw == 0) {
idx = READ_IO;
} else {
if (is_sync)
idx = WRITE_SYNC_IO;
else
idx = WRITE_ASYNC_IO;
}
spin_lock_irqsave(&sbi->iostat_lat_lock, flags);
io_lat->sum_lat[idx][iotype] += ts_diff;
io_lat->bio_cnt[idx][iotype]++;
if (ts_diff > io_lat->peak_lat[idx][iotype])
io_lat->peak_lat[idx][iotype] = ts_diff;
spin_unlock_irqrestore(&sbi->iostat_lat_lock, flags);
}
void iostat_update_and_unbind_ctx(struct bio *bio, int rw)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
bool is_sync = bio->bi_opf & REQ_SYNC;
if (rw == 0)
bio->bi_private = iostat_ctx->post_read_ctx;
else
bio->bi_private = iostat_ctx->sbi;
__update_iostat_latency(iostat_ctx, rw, is_sync);
mempool_free(iostat_ctx, bio_iostat_ctx_pool);
}
void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx)
{
struct bio_iostat_ctx *iostat_ctx;
/* Due to the mempool, this never fails. */
iostat_ctx = mempool_alloc(bio_iostat_ctx_pool, GFP_NOFS);
iostat_ctx->sbi = sbi;
iostat_ctx->submit_ts = 0;
iostat_ctx->type = 0;
iostat_ctx->post_read_ctx = ctx;
bio->bi_private = iostat_ctx;
}
int __init f2fs_init_iostat_processing(void)
{
bio_iostat_ctx_cache =
kmem_cache_create("f2fs_bio_iostat_ctx",
sizeof(struct bio_iostat_ctx), 0, 0, NULL);
if (!bio_iostat_ctx_cache)
goto fail;
bio_iostat_ctx_pool =
mempool_create_slab_pool(NUM_PREALLOC_IOSTAT_CTXS,
bio_iostat_ctx_cache);
if (!bio_iostat_ctx_pool)
goto fail_free_cache;
return 0;
fail_free_cache:
kmem_cache_destroy(bio_iostat_ctx_cache);
fail:
return -ENOMEM;
}
void f2fs_destroy_iostat_processing(void)
{
mempool_destroy(bio_iostat_ctx_pool);
kmem_cache_destroy(bio_iostat_ctx_cache);
}
int f2fs_init_iostat(struct f2fs_sb_info *sbi)
{
/* init iostat info */
spin_lock_init(&sbi->iostat_lock);
spin_lock_init(&sbi->iostat_lat_lock);
sbi->iostat_enable = false;
sbi->iostat_period_ms = DEFAULT_IOSTAT_PERIOD_MS;
sbi->iostat_io_lat = f2fs_kzalloc(sbi, sizeof(struct iostat_lat_info),
GFP_KERNEL);
if (!sbi->iostat_io_lat)
return -ENOMEM;
return 0;
}
void f2fs_destroy_iostat(struct f2fs_sb_info *sbi)
{
kfree(sbi->iostat_io_lat);
}
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright 2021 Google LLC
* Author: Daeho Jeong <daehojeong@google.com>
*/
#ifndef __F2FS_IOSTAT_H__
#define __F2FS_IOSTAT_H__
struct bio_post_read_ctx;
#ifdef CONFIG_F2FS_IOSTAT
#define DEFAULT_IOSTAT_PERIOD_MS 3000
#define MIN_IOSTAT_PERIOD_MS 100
/* maximum period of iostat tracing is 1 day */
#define MAX_IOSTAT_PERIOD_MS 8640000
enum {
READ_IO,
WRITE_SYNC_IO,
WRITE_ASYNC_IO,
MAX_IO_TYPE,
};
struct iostat_lat_info {
unsigned long sum_lat[MAX_IO_TYPE][NR_PAGE_TYPE]; /* sum of io latencies */
unsigned long peak_lat[MAX_IO_TYPE][NR_PAGE_TYPE]; /* peak io latency */
unsigned int bio_cnt[MAX_IO_TYPE][NR_PAGE_TYPE]; /* bio count */
};
extern int __maybe_unused iostat_info_seq_show(struct seq_file *seq,
void *offset);
extern void f2fs_reset_iostat(struct f2fs_sb_info *sbi);
extern void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes);
struct bio_iostat_ctx {
struct f2fs_sb_info *sbi;
unsigned long submit_ts;
enum page_type type;
struct bio_post_read_ctx *post_read_ctx;
};
static inline void iostat_update_submit_ctx(struct bio *bio,
enum page_type type)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
iostat_ctx->submit_ts = jiffies;
iostat_ctx->type = type;
}
static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
return iostat_ctx->post_read_ctx;
}
extern void iostat_update_and_unbind_ctx(struct bio *bio, int rw);
extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx);
extern int f2fs_init_iostat_processing(void);
extern void f2fs_destroy_iostat_processing(void);
extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
extern void f2fs_destroy_iostat(struct f2fs_sb_info *sbi);
#else
static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes) {}
static inline void iostat_update_and_unbind_ctx(struct bio *bio, int rw) {}
static inline void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx) {}
static inline void iostat_update_submit_ctx(struct bio *bio,
enum page_type type) {}
static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
{
return bio->bi_private;
}
static inline int f2fs_init_iostat_processing(void) { return 0; }
static inline void f2fs_destroy_iostat_processing(void) {}
static inline int f2fs_init_iostat(struct f2fs_sb_info *sbi) { return 0; }
static inline void f2fs_destroy_iostat(struct f2fs_sb_info *sbi) {}
#endif
#endif /* __F2FS_IOSTAT_H__ */
This diff is collapsed.
......@@ -91,7 +91,8 @@ static struct fsync_inode_entry *add_fsync_inode(struct f2fs_sb_info *sbi,
goto err_out;
}
entry = f2fs_kmem_cache_alloc(fsync_entry_slab, GFP_F2FS_ZERO);
entry = f2fs_kmem_cache_alloc(fsync_entry_slab,
GFP_F2FS_ZERO, true, NULL);
entry->inode = inode;
list_add_tail(&entry->list, head);
......
This diff is collapsed.
......@@ -142,7 +142,7 @@ enum {
};
/*
* In the victim_sel_policy->alloc_mode, there are two block allocation modes.
* In the victim_sel_policy->alloc_mode, there are three block allocation modes.
* LFS writes data sequentially with cleaning operations.
* SSR (Slack Space Recycle) reuses obsolete space without cleaning operations.
* AT_SSR (Age Threshold based Slack Space Recycle) merges fragments into
......@@ -155,7 +155,7 @@ enum {
};
/*
* In the victim_sel_policy->gc_mode, there are two gc, aka cleaning, modes.
* In the victim_sel_policy->gc_mode, there are three gc, aka cleaning, modes.
* GC_CB is based on cost-benefit algorithm.
* GC_GREEDY is based on greedy algorithm.
* GC_AT is based on age-threshold algorithm.
......
This diff is collapsed.
......@@ -17,6 +17,7 @@
#include "f2fs.h"
#include "segment.h"
#include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
static struct proc_dir_entry *f2fs_proc_root;
......@@ -307,6 +308,14 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
return sysfs_emit(buf, "%u\n", sbi->compr_new_inode);
#endif
if (!strcmp(a->attr.name, "gc_segment_mode"))
return sysfs_emit(buf, "%u\n", sbi->gc_segment_mode);
if (!strcmp(a->attr.name, "gc_reclaimed_segments")) {
return sysfs_emit(buf, "%u\n",
sbi->gc_reclaimed_segs[sbi->gc_segment_mode]);
}
ui = (unsigned int *)(ptr + a->offset);
return sprintf(buf, "%u\n", *ui);
......@@ -343,7 +352,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
set = false;
}
if (strlen(name) >= F2FS_EXTENSION_LEN)
if (!strlen(name) || strlen(name) >= F2FS_EXTENSION_LEN)
return -EINVAL;
down_write(&sbi->sb_lock);
......@@ -420,6 +429,8 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
if (!strcmp(a->attr.name, "discard_granularity")) {
if (t == 0 || t > MAX_PLIST_NUM)
return -EINVAL;
if (!f2fs_block_unit_discard(sbi))
return -EINVAL;
if (t == *ui)
return count;
*ui = t;
......@@ -467,6 +478,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
return count;
}
#ifdef CONFIG_F2FS_IOSTAT
if (!strcmp(a->attr.name, "iostat_enable")) {
sbi->iostat_enable = !!t;
if (!sbi->iostat_enable)
......@@ -482,6 +494,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
spin_unlock(&sbi->iostat_lock);
return count;
}
#endif
#ifdef CONFIG_F2FS_FS_COMPRESSION
if (!strcmp(a->attr.name, "compr_written_block") ||
......@@ -515,6 +528,29 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
return count;
}
if (!strcmp(a->attr.name, "gc_segment_mode")) {
if (t < MAX_GC_MODE)
sbi->gc_segment_mode = t;
else
return -EINVAL;
return count;
}
if (!strcmp(a->attr.name, "gc_reclaimed_segments")) {
if (t != 0)
return -EINVAL;
sbi->gc_reclaimed_segs[sbi->gc_segment_mode] = 0;
return count;
}
if (!strcmp(a->attr.name, "seq_file_ra_mul")) {
if (t >= MIN_RA_MUL && t <= MAX_RA_MUL)
sbi->seq_file_ra_mul = t;
else
return -EINVAL;
return count;
}
*ui = (unsigned int)t;
return count;
......@@ -667,8 +703,10 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, discard_idle_interval,
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle_interval, interval_time[GC_TIME]);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info,
umount_discard_timeout, interval_time[UMOUNT_DISCARD_TIMEOUT]);
#ifdef CONFIG_F2FS_IOSTAT
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, iostat_enable);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
#endif
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
......@@ -740,6 +778,10 @@ F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_candidate_count, max_candidate_cou
F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_weight, age_weight);
F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_threshold, age_threshold);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, seq_file_ra_mul, seq_file_ra_mul);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_segment_mode, gc_segment_mode);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_reclaimed_segments, gc_reclaimed_segs);
#define ATTR_LIST(name) (&f2fs_attr_##name.attr)
static struct attribute *f2fs_attrs[] = {
ATTR_LIST(gc_urgent_sleep_time),
......@@ -770,8 +812,10 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(discard_idle_interval),
ATTR_LIST(gc_idle_interval),
ATTR_LIST(umount_discard_timeout),
#ifdef CONFIG_F2FS_IOSTAT
ATTR_LIST(iostat_enable),
ATTR_LIST(iostat_period_ms),
#endif
ATTR_LIST(readdir_ra),
ATTR_LIST(max_io_bytes),
ATTR_LIST(gc_pin_file_thresh),
......@@ -812,6 +856,9 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(atgc_candidate_count),
ATTR_LIST(atgc_age_weight),
ATTR_LIST(atgc_age_threshold),
ATTR_LIST(seq_file_ra_mul),
ATTR_LIST(gc_segment_mode),
ATTR_LIST(gc_reclaimed_segments),
NULL,
};
ATTRIBUTE_GROUPS(f2fs);
......@@ -1036,101 +1083,6 @@ static int __maybe_unused segment_bits_seq_show(struct seq_file *seq,
return 0;
}
void f2fs_record_iostat(struct f2fs_sb_info *sbi)
{
unsigned long long iostat_diff[NR_IO_TYPE];
int i;
if (time_is_after_jiffies(sbi->iostat_next_period))
return;
/* Need double check under the lock */
spin_lock(&sbi->iostat_lock);
if (time_is_after_jiffies(sbi->iostat_next_period)) {
spin_unlock(&sbi->iostat_lock);
return;
}
sbi->iostat_next_period = jiffies +
msecs_to_jiffies(sbi->iostat_period_ms);
for (i = 0; i < NR_IO_TYPE; i++) {
iostat_diff[i] = sbi->rw_iostat[i] -
sbi->prev_rw_iostat[i];
sbi->prev_rw_iostat[i] = sbi->rw_iostat[i];
}
spin_unlock(&sbi->iostat_lock);
trace_f2fs_iostat(sbi, iostat_diff);
}
static int __maybe_unused iostat_info_seq_show(struct seq_file *seq,
void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
time64_t now = ktime_get_real_seconds();
if (!sbi->iostat_enable)
return 0;
seq_printf(seq, "time: %-16llu\n", now);
/* print app write IOs */
seq_puts(seq, "[WRITE]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_IO]);
/* print fs write IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GC_DATA_IO]);
seq_printf(seq, "fs gc node: %-16llu\n",
sbi->rw_iostat[FS_GC_NODE_IO]);
seq_printf(seq, "fs cp data: %-16llu\n",
sbi->rw_iostat[FS_CP_DATA_IO]);
seq_printf(seq, "fs cp node: %-16llu\n",
sbi->rw_iostat[FS_CP_NODE_IO]);
seq_printf(seq, "fs cp meta: %-16llu\n",
sbi->rw_iostat[FS_CP_META_IO]);
/* print app read IOs */
seq_puts(seq, "[READ]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_READ_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_READ_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_READ_IO]);
/* print fs read IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_READ_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GDATA_READ_IO]);
seq_printf(seq, "fs compr_data: %-16llu\n",
sbi->rw_iostat[FS_CDATA_READ_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_READ_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_READ_IO]);
/* print other IOs */
seq_puts(seq, "[OTHER]\n");
seq_printf(seq, "fs discard: %-16llu\n",
sbi->rw_iostat[FS_DISCARD]);
return 0;
}
static int __maybe_unused victim_bits_seq_show(struct seq_file *seq,
void *offset)
{
......@@ -1213,13 +1165,15 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
if (sbi->s_proc) {
proc_create_single_data("segment_info", S_IRUGO, sbi->s_proc,
proc_create_single_data("segment_info", 0444, sbi->s_proc,
segment_info_seq_show, sb);
proc_create_single_data("segment_bits", S_IRUGO, sbi->s_proc,
proc_create_single_data("segment_bits", 0444, sbi->s_proc,
segment_bits_seq_show, sb);
proc_create_single_data("iostat_info", S_IRUGO, sbi->s_proc,
#ifdef CONFIG_F2FS_IOSTAT
proc_create_single_data("iostat_info", 0444, sbi->s_proc,
iostat_info_seq_show, sb);
proc_create_single_data("victim_bits", S_IRUGO, sbi->s_proc,
#endif
proc_create_single_data("victim_bits", 0444, sbi->s_proc,
victim_bits_seq_show, sb);
}
return 0;
......@@ -1238,7 +1192,9 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi)
{
if (sbi->s_proc) {
#ifdef CONFIG_F2FS_IOSTAT
remove_proc_entry("iostat_info", sbi->s_proc);
#endif
remove_proc_entry("segment_info", sbi->s_proc);
remove_proc_entry("segment_bits", sbi->s_proc);
remove_proc_entry("victim_bits", sbi->s_proc);
......
......@@ -27,7 +27,8 @@ static void *xattr_alloc(struct f2fs_sb_info *sbi, int size, bool *is_inline)
{
if (likely(size == sbi->inline_xattr_slab_size)) {
*is_inline = true;
return kmem_cache_zalloc(sbi->inline_xattr_slab, GFP_NOFS);
return f2fs_kmem_cache_alloc(sbi->inline_xattr_slab,
GFP_F2FS_ZERO, false, sbi);
}
*is_inline = false;
return f2fs_kzalloc(sbi, size, GFP_NOFS);
......
......@@ -1818,6 +1818,7 @@ DEFINE_EVENT(f2fs_zip_end, f2fs_decompress_pages_end,
TP_ARGS(inode, cluster_idx, compressed_size, ret)
);
#ifdef CONFIG_F2FS_IOSTAT
TRACE_EVENT(f2fs_iostat,
TP_PROTO(struct f2fs_sb_info *sbi, unsigned long long *iostat),
......@@ -1894,6 +1895,102 @@ TRACE_EVENT(f2fs_iostat,
__entry->fs_cdrio, __entry->fs_nrio, __entry->fs_mrio)
);
#ifndef __F2FS_IOSTAT_LATENCY_TYPE
#define __F2FS_IOSTAT_LATENCY_TYPE
struct f2fs_iostat_latency {
unsigned int peak_lat;
unsigned int avg_lat;
unsigned int cnt;
};
#endif /* __F2FS_IOSTAT_LATENCY_TYPE */
TRACE_EVENT(f2fs_iostat_latency,
TP_PROTO(struct f2fs_sb_info *sbi, struct f2fs_iostat_latency (*iostat_lat)[NR_PAGE_TYPE]),
TP_ARGS(sbi, iostat_lat),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(unsigned int, d_rd_peak)
__field(unsigned int, d_rd_avg)
__field(unsigned int, d_rd_cnt)
__field(unsigned int, n_rd_peak)
__field(unsigned int, n_rd_avg)
__field(unsigned int, n_rd_cnt)
__field(unsigned int, m_rd_peak)
__field(unsigned int, m_rd_avg)
__field(unsigned int, m_rd_cnt)
__field(unsigned int, d_wr_s_peak)
__field(unsigned int, d_wr_s_avg)
__field(unsigned int, d_wr_s_cnt)
__field(unsigned int, n_wr_s_peak)
__field(unsigned int, n_wr_s_avg)
__field(unsigned int, n_wr_s_cnt)
__field(unsigned int, m_wr_s_peak)
__field(unsigned int, m_wr_s_avg)
__field(unsigned int, m_wr_s_cnt)
__field(unsigned int, d_wr_as_peak)
__field(unsigned int, d_wr_as_avg)
__field(unsigned int, d_wr_as_cnt)
__field(unsigned int, n_wr_as_peak)
__field(unsigned int, n_wr_as_avg)
__field(unsigned int, n_wr_as_cnt)
__field(unsigned int, m_wr_as_peak)
__field(unsigned int, m_wr_as_avg)
__field(unsigned int, m_wr_as_cnt)
),
TP_fast_assign(
__entry->dev = sbi->sb->s_dev;
__entry->d_rd_peak = iostat_lat[0][DATA].peak_lat;
__entry->d_rd_avg = iostat_lat[0][DATA].avg_lat;
__entry->d_rd_cnt = iostat_lat[0][DATA].cnt;
__entry->n_rd_peak = iostat_lat[0][NODE].peak_lat;
__entry->n_rd_avg = iostat_lat[0][NODE].avg_lat;
__entry->n_rd_cnt = iostat_lat[0][NODE].cnt;
__entry->m_rd_peak = iostat_lat[0][META].peak_lat;
__entry->m_rd_avg = iostat_lat[0][META].avg_lat;
__entry->m_rd_cnt = iostat_lat[0][META].cnt;
__entry->d_wr_s_peak = iostat_lat[1][DATA].peak_lat;
__entry->d_wr_s_avg = iostat_lat[1][DATA].avg_lat;
__entry->d_wr_s_cnt = iostat_lat[1][DATA].cnt;
__entry->n_wr_s_peak = iostat_lat[1][NODE].peak_lat;
__entry->n_wr_s_avg = iostat_lat[1][NODE].avg_lat;
__entry->n_wr_s_cnt = iostat_lat[1][NODE].cnt;
__entry->m_wr_s_peak = iostat_lat[1][META].peak_lat;
__entry->m_wr_s_avg = iostat_lat[1][META].avg_lat;
__entry->m_wr_s_cnt = iostat_lat[1][META].cnt;
__entry->d_wr_as_peak = iostat_lat[2][DATA].peak_lat;
__entry->d_wr_as_avg = iostat_lat[2][DATA].avg_lat;
__entry->d_wr_as_cnt = iostat_lat[2][DATA].cnt;
__entry->n_wr_as_peak = iostat_lat[2][NODE].peak_lat;
__entry->n_wr_as_avg = iostat_lat[2][NODE].avg_lat;
__entry->n_wr_as_cnt = iostat_lat[2][NODE].cnt;
__entry->m_wr_as_peak = iostat_lat[2][META].peak_lat;
__entry->m_wr_as_avg = iostat_lat[2][META].avg_lat;
__entry->m_wr_as_cnt = iostat_lat[2][META].cnt;
),
TP_printk("dev = (%d,%d), "
"iotype [peak lat.(ms)/avg lat.(ms)/count], "
"rd_data [%u/%u/%u], rd_node [%u/%u/%u], rd_meta [%u/%u/%u], "
"wr_sync_data [%u/%u/%u], wr_sync_node [%u/%u/%u], "
"wr_sync_meta [%u/%u/%u], wr_async_data [%u/%u/%u], "
"wr_async_node [%u/%u/%u], wr_async_meta [%u/%u/%u]",
show_dev(__entry->dev),
__entry->d_rd_peak, __entry->d_rd_avg, __entry->d_rd_cnt,
__entry->n_rd_peak, __entry->n_rd_avg, __entry->n_rd_cnt,
__entry->m_rd_peak, __entry->m_rd_avg, __entry->m_rd_cnt,
__entry->d_wr_s_peak, __entry->d_wr_s_avg, __entry->d_wr_s_cnt,
__entry->n_wr_s_peak, __entry->n_wr_s_avg, __entry->n_wr_s_cnt,
__entry->m_wr_s_peak, __entry->m_wr_s_avg, __entry->m_wr_s_cnt,
__entry->d_wr_as_peak, __entry->d_wr_as_avg, __entry->d_wr_as_cnt,
__entry->n_wr_as_peak, __entry->n_wr_as_avg, __entry->n_wr_as_cnt,
__entry->m_wr_as_peak, __entry->m_wr_as_avg, __entry->m_wr_as_cnt)
);
#endif
TRACE_EVENT(f2fs_bmap,
TP_PROTO(struct inode *inode, sector_t lblock, sector_t pblock),
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment