1. 04 Jan, 2022 9 commits
    • Yang Li's avatar
      f2fs: Simplify bool conversion · d361b690
      Yang Li authored
      Fix the following coccicheck warning:
      ./fs/f2fs/sysfs.c:491:41-46: WARNING: conversion to bool not needed here
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d361b690
    • Chao Yu's avatar
      f2fs: don't drop compressed page cache in .{invalidate,release}page · 2a64e303
      Chao Yu authored
      For compressed inode, in .{invalidate,release}page, we will call
      f2fs_invalidate_compress_pages() to drop all compressed page cache of
      current inode.
      
      But we don't need to drop compressed page cache synchronously in
      .invalidatepage, because, all trancation paths of compressed physical
      block has been covered with f2fs_invalidate_compress_page().
      
      And also we don't need to drop compressed page cache synchronously
      in .releasepage, because, if there is out-of-memory, we can count
      on page cache reclaim on sbi->compress_inode.
      
      BTW, this patch may fix the issue reported below:
      
      https://lore.kernel.org/linux-f2fs-devel/20211202092812.197647-1-changfengnan@vivo.com/T/#uSigned-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2a64e303
    • Chao Yu's avatar
      f2fs: fix to reserve space for IO align feature · 300a8429
      Chao Yu authored
      https://bugzilla.kernel.org/show_bug.cgi?id=204137
      
      With below script, we will hit panic during new segment allocation:
      
      DISK=bingo.img
      MOUNT_DIR=/mnt/f2fs
      
      dd if=/dev/zero of=$DISK bs=1M count=105
      mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
      
      mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
      
      for (( i = 0; i < 4096; i++ )); do
      	name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
      	mkdir $MOUNT_DIR/$name
      done
      
      umount $MOUNT_DIR
      rm $DISK
      
      --- Core dump ---
      Call Trace:
       allocate_segment_by_default+0x9d/0x100 [f2fs]
       f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
       do_write_page+0x62/0x110 [f2fs]
       f2fs_outplace_write_data+0x43/0xc0 [f2fs]
       f2fs_do_write_data_page+0x386/0x560 [f2fs]
       __write_data_page+0x706/0x850 [f2fs]
       f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
       f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
       do_writepages+0x1c/0x70
       __filemap_fdatawrite_range+0xaa/0xe0
       filemap_fdatawrite+0x1f/0x30
       f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
       block_operations+0xdc/0x350 [f2fs]
       f2fs_write_checkpoint+0x104/0x1150 [f2fs]
       f2fs_sync_fs+0xa2/0x120 [f2fs]
       f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
       f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
       do_writepages+0x1c/0x70
       __writeback_single_inode+0x45/0x320
       writeback_sb_inodes+0x273/0x5c0
       wb_writeback+0xff/0x2e0
       wb_workfn+0xa1/0x370
       process_one_work+0x138/0x350
       worker_thread+0x4d/0x3d0
       kthread+0x109/0x140
       ret_from_fork+0x25/0x30
      
      The root cause here is, with IO alignment feature enables, in worst
      case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
      due to IO alignment feature will fill dummy pages to make IO being
      aligned.
      
      So we will easily run out of free segments during non-inline directory's
      data writeback, even in process of foreground GC.
      
      In order to fix this issue, I just propose to reserve additional free
      space for IO alignment feature to handle worst case of free space usage
      ratio during FGGC.
      
      Fixes: 0a595eba ("f2fs: support IO alignment for DATA and NODE writes")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      300a8429
    • Chao Yu's avatar
      f2fs: fix to check available space of CP area correctly in update_ckpt_flags() · b702c83e
      Chao Yu authored
      Otherwise, nat_bit area may be persisted across boundary of CP area during
      nat_bit rebuilding.
      
      Fixes: 94c821fb ("f2fs: rebuild nat_bits during umount")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b702c83e
    • Chao Yu's avatar
      f2fs: support fault injection to f2fs_trylock_op() · 3e020389
      Chao Yu authored
      f2fs: support fault injection for f2fs_trylock_op()
      
      This patch supports to inject fault into f2fs_trylock_op().
      
      Usage:
      a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
      b) mount -o fault_type=65536 <dev> <mountpoint>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3e020389
    • Chao Yu's avatar
      f2fs: clean up __find_inline_xattr() with __find_xattr() · dd9d4a3a
      Chao Yu authored
      Just cleanup, no logic change.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dd9d4a3a
    • Chao Yu's avatar
      f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() · 645a3c40
      Chao Yu authored
      As Wenqing Liu reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215235
      
      - Overview
      page fault in f2fs_setxattr() when mount and operate on corrupted image
      
      - Reproduce
      tested on kernel 5.16-rc3, 5.15.X under root
      
      1. unzip tmp7.zip
      2. ./single.sh f2fs 7
      
      Sometimes need to run the script several times
      
      - Kernel dump
      loop0: detected capacity change from 0 to 131072
      F2FS-fs (loop0): Found nat_bits in checkpoint
      F2FS-fs (loop0): Mounted with checkpoint version = 7548c2ee
      BUG: unable to handle page fault for address: ffffe47bc7123f48
      RIP: 0010:kfree+0x66/0x320
      Call Trace:
       __f2fs_setxattr+0x2aa/0xc00 [f2fs]
       f2fs_setxattr+0xfa/0x480 [f2fs]
       __f2fs_set_acl+0x19b/0x330 [f2fs]
       __vfs_removexattr+0x52/0x70
       __vfs_removexattr_locked+0xb1/0x140
       vfs_removexattr+0x56/0x100
       removexattr+0x57/0x80
       path_removexattr+0xa3/0xc0
       __x64_sys_removexattr+0x17/0x20
       do_syscall_64+0x37/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The root cause is in __f2fs_setxattr(), we missed to do sanity check on
      last xattr entry, result in out-of-bound memory access during updating
      inconsistent xattr data of target inode.
      
      After the fix, it can detect such xattr inconsistency as below:
      
      F2FS-fs (loop11): inode (7) has invalid last xattr entry, entry_size: 60676
      F2FS-fs (loop11): inode (8) has corrupted xattr
      F2FS-fs (loop11): inode (8) has corrupted xattr
      F2FS-fs (loop11): inode (8) has invalid last xattr entry, entry_size: 47736
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      645a3c40
    • Jaegeuk Kim's avatar
      f2fs: do not bother checkpoint by f2fs_get_node_info · a9419b63
      Jaegeuk Kim authored
      This patch tries to mitigate lock contention between f2fs_write_checkpoint and
      f2fs_get_node_info along with nat_tree_lock.
      
      The idea is, if checkpoint is currently running, other threads that try to grab
      nat_tree_lock would be better to wait for checkpoint.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a9419b63
    • Jaegeuk Kim's avatar
      f2fs: avoid down_write on nat_tree_lock during checkpoint · 0df035c7
      Jaegeuk Kim authored
      Let's cache nat entry if there's no lock contention only.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0df035c7
  2. 14 Dec, 2021 2 commits
    • Hyeong-Jun Kim's avatar
      f2fs: compress: fix potential deadlock of compress file · 7377e853
      Hyeong-Jun Kim authored
      There is a potential deadlock between writeback process and a process
      performing write_begin() or write_cache_pages() while trying to write
      same compress file, but not compressable, as below:
      
      [Process A] - doing checkpoint
      [Process B]                     [Process C]
      f2fs_write_cache_pages()
      - lock_page() [all pages in cluster, 0-31]
      - f2fs_write_multi_pages()
       - f2fs_write_raw_pages()
        - f2fs_write_single_data_page()
         - f2fs_do_write_data_page()
           - return -EAGAIN [f2fs_trylock_op() failed]
         - unlock_page(page) [e.g., page 0]
                                      - generic_perform_write()
                                       - f2fs_write_begin()
                                        - f2fs_prepare_compress_overwrite()
                                         - prepare_compress_overwrite()
                                          - lock_page() [e.g., page 0]
                                          - lock_page() [e.g., page 1]
         - lock_page(page) [e.g., page 0]
      
      Since there is no compress process, it is no longer necessary to hold
      locks on every pages in cluster within f2fs_write_raw_pages().
      
      This patch changes f2fs_write_raw_pages() to release all locks first
      and then perform write same as the non-compress file in
      f2fs_write_cache_pages().
      
      Fixes: 4c8ff709 ("f2fs: support data compression")
      Signed-off-by: default avatarHyeong-Jun Kim <hj514.kim@samsung.com>
      Signed-off-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarYoungjin Gil <youngjin.gil@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7377e853
    • Jaegeuk Kim's avatar
      f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file · 19bdba52
      Jaegeuk Kim authored
      Android OTA failed due to SBI_NEED_FSCK flag when pinning the file. Let's avoid
      it since we can do in-place-updates.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      19bdba52
  3. 10 Dec, 2021 10 commits
    • Daeho Jeong's avatar
      f2fs: add gc_urgent_high_remaining sysfs node · 325163e9
      Daeho Jeong authored
      Added a new sysfs node called gc_urgent_high_remaining. The user can
      set the trial count limit for GC urgent high mode with this value. If
      GC thread gets to the limit, the mode will turn back to GC normal mode.
      By default, the value is zero, which means there is no limit like before.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      325163e9
    • Chao Yu's avatar
      f2fs: fix to do sanity check in is_alive() · 77900c45
      Chao Yu authored
      In fuzzed image, SSA table may indicate that a data block belongs to
      invalid node, which node ID is out-of-range (0, 1, 2 or max_nid), in
      order to avoid migrating inconsistent data in such corrupted image,
      let's do sanity check anyway before data block migration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      77900c45
    • Chao Yu's avatar
      f2fs: fix to avoid panic in is_alive() if metadata is inconsistent · f6db4307
      Chao Yu authored
      As report by Wenqing Liu in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215231
      
      If we enable CONFIG_F2FS_CHECK_FS config, and with fuzzed image attached
      in above link, we will encounter panic when executing below script:
      
      1. mkdir mnt
      2. mount -t f2fs tmp1.img mnt
      3. touch tmp
      
      F2FS-fs (loop11): mismatched blkaddr 5765 (source_blkaddr 1) in seg 3
      kernel BUG at fs/f2fs/gc.c:1042!
       do_garbage_collect+0x90f/0xa80 [f2fs]
       f2fs_gc+0x294/0x12a0 [f2fs]
       f2fs_balance_fs+0x2c5/0x7d0 [f2fs]
       f2fs_create+0x239/0xd90 [f2fs]
       lookup_open+0x45e/0xa90
       open_last_lookups+0x203/0x670
       path_openat+0xae/0x490
       do_filp_open+0xbc/0x160
       do_sys_openat2+0x2f1/0x500
       do_sys_open+0x5e/0xa0
       __x64_sys_openat+0x28/0x40
      
      Previously, f2fs tries to catch data inconcistency exception in between
      SSA and SIT table during GC, however once the exception is caught, it will
      call f2fs_bug_on to hang kernel, it's not needed, instead, let's set
      SBI_NEED_FSCK flag and skip migrating current block.
      
      Fixes: bbf9f7d9 ("f2fs: Fix indefinite loop in f2fs_gc()")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f6db4307
    • Chao Yu's avatar
      f2fs: fix to do sanity check on inode type during garbage collection · 9056d648
      Chao Yu authored
      As report by Wenqing Liu in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215231
      
      - Overview
      kernel NULL pointer dereference triggered  in folio_mark_dirty() when mount and operate on a crafted f2fs image
      
      - Reproduce
      tested on kernel 5.16-rc3, 5.15.X under root
      
      1. mkdir mnt
      2. mount -t f2fs tmp1.img mnt
      3. touch tmp
      4. cp tmp mnt
      
      F2FS-fs (loop0): sanity_check_inode: inode (ino=49) extent info [5942, 4294180864, 4] is incorrect, run fsck to fix
      F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=31340049, run fsck to fix.
      BUG: kernel NULL pointer dereference, address: 0000000000000000
       folio_mark_dirty+0x33/0x50
       move_data_page+0x2dd/0x460 [f2fs]
       do_garbage_collect+0xc18/0x16a0 [f2fs]
       f2fs_gc+0x1d3/0xd90 [f2fs]
       f2fs_balance_fs+0x13a/0x570 [f2fs]
       f2fs_create+0x285/0x840 [f2fs]
       path_openat+0xe6d/0x1040
       do_filp_open+0xc5/0x140
       do_sys_openat2+0x23a/0x310
       do_sys_open+0x57/0x80
      
      The root cause is for special file: e.g. character, block, fifo or socket file,
      f2fs doesn't assign address space operations pointer array for mapping->a_ops field,
      so, in a fuzzed image, SSA table indicates a data block belong to special file, when
      f2fs tries to migrate that block, it causes NULL pointer access once move_data_page()
      calls a_ops->set_dirty_page().
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9056d648
    • Jaegeuk Kim's avatar
      f2fs: avoid duplicate call of mark_inode_dirty · 766c6639
      Jaegeuk Kim authored
      Let's check the condition first before set|clear bit.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      766c6639
    • Jaegeuk Kim's avatar
      f2fs: show number of pending discard commands · ae2e2804
      Jaegeuk Kim authored
      This information can be used to check how much time we need to give to issue
      all the discard commands.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ae2e2804
    • Fengnan Chang's avatar
      f2fs: support POSIX_FADV_DONTNEED drop compressed page cache · e64347ae
      Fengnan Chang authored
      Previously, compressed page cache drop when clean page cache, but
      POSIX_FADV_DONTNEED can't clean compressed page cache because raw page
      don't have private data, and won't call f2fs_invalidate_compress_pages.
      This commit call f2fs_invalidate_compress_pages() directly in
      f2fs_file_fadvise() for POSIX_FADV_DONTNEED case.
      Signed-off-by: default avatarFengnan Chang <changfengnan@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e64347ae
    • Fengnan Chang's avatar
      f2fs: fix remove page failed in invalidate compress pages · d1917865
      Fengnan Chang authored
      Since compress inode not a regular file, generic_error_remove_page in
      f2fs_invalidate_compress_pages will always be failed, set compress
      inode as a regular file to fix it.
      
      Fixes: 6ce19aff ("f2fs: compress: add compress_inode to cache compressed blocks")
      Signed-off-by: default avatarFengnan Chang <changfengnan@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d1917865
    • Jaegeuk Kim's avatar
      f2fs: show more DIO information in tracepoint · bd984c03
      Jaegeuk Kim authored
      This prints more information of DIO in tracepoint.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bd984c03
    • Eric Biggers's avatar
      f2fs: use iomap for direct I/O · a1e09b03
      Eric Biggers authored
      Make f2fs_file_read_iter() and f2fs_file_write_iter() use the iomap
      direct I/O implementation instead of the fs/direct-io.c one.
      
      The iomap implementation is more efficient, and it also avoids the need
      to add new features and optimizations to the old implementation.
      
      This new implementation also eliminates the need for f2fs to hook bio
      submission and completion and to allocate memory per-bio.  This is
      because it's possible to correctly update f2fs's in-flight DIO counters
      using __iomap_dio_rw() in combination with an implementation of
      iomap_dio_ops::end_io() (as suggested by Christoph Hellwig).
      
      When possible, this new implementation preserves existing f2fs behavior
      such as the conditions for falling back to buffered I/O.
      
      This patch has been tested with xfstests by running 'gce-xfstests -c
      f2fs -g auto -X generic/017' with and without this patch; no regressions
      were seen.  (Some tests fail both before and after.  generic/017 hangs
      both before and after, so it had to be excluded.)
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      [Jaegeuk Kim: use spin_lock_bh for f2fs_update_iostat in softirq]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a1e09b03
  4. 04 Dec, 2021 4 commits
  5. 17 Nov, 2021 2 commits
  6. 15 Nov, 2021 3 commits
  7. 14 Nov, 2021 10 commits