1. 06 Sep, 2024 17 commits
  2. 21 Aug, 2024 10 commits
    • Christophe JAILLET's avatar
      f2fs: Use sysfs_emit_at() to simplify code · f7a678bb
      Christophe JAILLET authored
      This file already uses sysfs_emit(). So be consistent and also use
      sysfs_emit_at().
      
      This slightly simplifies the code and makes it more readable.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f7a678bb
    • Chao Yu's avatar
      f2fs: atomic: fix to forbid dio in atomic_file · b2c160f4
      Chao Yu authored
      atomic write can only be used via buffered IO, let's fail direct IO on
      atomic_file and return -EOPNOTSUPP.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b2c160f4
    • Yeongjin Gil's avatar
      f2fs: compress: don't redirty sparse cluster during {,de}compress · f785cec2
      Yeongjin Gil authored
      In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips
      writepage considering that it has been already truncated.
      This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not
      cleared during the writeback process for a compressed file including
      NULL_ADDR in compress_mode=user.
      
      This is the reproduction process:
      
      1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile
      2. f2fs_io compress testfile
      3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile
      4. f2fs_io decompress testfile
      
      To prevent the problem, let's check whether the cluster is fully
      allocated before redirty its pages.
      
      Fixes: 5fdb322f ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Reviewed-by: default avatarSunmin Jeong <s_min.jeong@samsung.com>
      Tested-by: default avatarJaewook Kim <jw5454.kim@samsung.com>
      Signed-off-by: default avatarYeongjin Gil <youngjin.gil@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f785cec2
    • Shin'ichiro Kawasaki's avatar
      f2fs: check discard support for conventional zones · 43aec4d0
      Shin'ichiro Kawasaki authored
      As the helper function f2fs_bdev_support_discard() shows, f2fs checks if
      the target block devices support discard by calling
      bdev_max_discard_sectors() and bdev_is_zoned(). This check works well
      for most cases, but it does not work for conventional zones on zoned
      block devices. F2fs assumes that zoned block devices support discard,
      and calls __submit_discard_cmd(). When __submit_discard_cmd() is called
      for sequential write required zones, it works fine since
      __submit_discard_cmd() issues zone reset commands instead of discard
      commands. However, when __submit_discard_cmd() is called for
      conventional zones, __blkdev_issue_discard() is called even when the
      devices do not support discard.
      
      The inappropriate __blkdev_issue_discard() call was not a problem before
      the commit 30f1e724 ("block: move discard checks into the ioctl
      handler") because __blkdev_issue_discard() checked if the target devices
      support discard or not. If not, it returned EOPNOTSUPP. After the
      commit, __blkdev_issue_discard() no longer checks it. It always returns
      zero and sets NULL to the given bio pointer. This NULL pointer triggers
      f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the
      commands below at the umount step, where /dev/nullb0 is a zoned null_blk
      with 5GB total size, 128MB zone size and 10 conventional zones.
      
      $ mkfs.f2fs -f -m /dev/nullb0
      $ mount /dev/nullb0 /mnt
      $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done
      $ umount /mnt
      
      To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call.
      When discard is requested for conventional zones, check if the device
      supports discard or not. If not, return EOPNOTSUPP.
      
      Fixes: 30f1e724 ("block: move discard checks into the ioctl handler")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      43aec4d0
    • Chao Yu's avatar
      f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread() · c7f114d8
      Chao Yu authored
      syzbot reports a f2fs bug as below:
      
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       print_report+0xe8/0x550 mm/kasan/report.c:491
       kasan_report+0x143/0x180 mm/kasan/report.c:601
       kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
       atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
       __refcount_add include/linux/refcount.h:184 [inline]
       __refcount_inc include/linux/refcount.h:241 [inline]
       refcount_inc include/linux/refcount.h:258 [inline]
       get_task_struct include/linux/sched/task.h:118 [inline]
       kthread_stop+0xca/0x630 kernel/kthread.c:704
       f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210
       f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283
       f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline]
       __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:907 [inline]
       __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      The root cause is below race condition, it may cause use-after-free
      issue in sbi->gc_th pointer.
      
      - remount
       - f2fs_remount
        - f2fs_stop_gc_thread
         - kfree(gc_th)
      				- f2fs_ioc_shutdown
      				 - f2fs_do_shutdown
      				  - f2fs_stop_gc_thread
      				   - kthread_stop(gc_th->f2fs_gc_task)
         : sbi->gc_thread = NULL;
      
      We will call f2fs_do_shutdown() in two paths:
      - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore
      for fixing.
      - for f2fs_shutdown() path, it's safe since caller has already grabbed
      sb->s_umount semaphore.
      
      Reported-by: syzbot+1a8e2b31f2ac9bd3d148@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000005c7ccb061e032b9b@google.com
      Fixes: 7950e9ac ("f2fs: stop gc/discard thread after fs shutdown")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c7f114d8
    • Chao Yu's avatar
      f2fs: atomic: fix to truncate pagecache before on-disk metadata truncation · ebd3309a
      Chao Yu authored
      We should always truncate pagecache while truncating on-disk data.
      
      Fixes: a46bebd5 ("f2fs: synchronize atomic write aborts")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ebd3309a
    • Chao Yu's avatar
      f2fs: fix to wait page writeback before setting gcing flag · a4d7f2b3
      Chao Yu authored
      Soft IRQ				Thread
      - f2fs_write_end_io
      					- f2fs_defragment_range
      					 - set_page_private_gcing
       - type = WB_DATA_TYPE(page, false);
       : assign type w/ F2FS_WB_CP_DATA
       due to page_private_gcing() is true
        - dec_page_count() w/ wrong type
        - end_page_writeback()
      
      Value of F2FS_WB_CP_DATA reference count may become negative under above
      race condition, the root cause is we missed to wait page writeback before
      setting gcing page private flag, let's fix it.
      
      Fixes: 2d1fe8a8 ("f2fs: fix to tag gcing flag on page during file defragment")
      Fixes: 4961acdd ("f2fs: fix to tag gcing flag on page during block migration")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a4d7f2b3
    • Yeongjin Gil's avatar
      f2fs: Create COW inode from parent dentry for atomic write · 8c1b7879
      Yeongjin Gil authored
      The i_pino in f2fs_inode_info has the previous parent's i_ino when inode
      was renamed, which may cause f2fs_ioc_start_atomic_write to fail.
      If file_wrong_pino is true and i_nlink is 1, then to find a valid pino,
      we should refer to the dentry from inode.
      
      To resolve this issue, let's get parent inode using parent dentry
      directly.
      
      Fixes: 3db1de0e ("f2fs: change the current atomic write way")
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Reviewed-by: default avatarSunmin Jeong <s_min.jeong@samsung.com>
      Signed-off-by: default avatarYeongjin Gil <youngjin.gil@samsung.com>
      Reviewed-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8c1b7879
    • Jann Horn's avatar
      f2fs: Require FMODE_WRITE for atomic write ioctls · 4f5a100f
      Jann Horn authored
      The F2FS ioctls for starting and committing atomic writes check for
      inode_owner_or_capable(), but this does not give LSMs like SELinux or
      Landlock an opportunity to deny the write access - if the caller's FSUID
      matches the inode's UID, inode_owner_or_capable() immediately returns true.
      
      There are scenarios where LSMs want to deny a process the ability to write
      particular files, even files that the FSUID of the process owns; but this
      can currently partially be bypassed using atomic write ioctls in two ways:
      
       - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can
         truncate an inode to size 0
       - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert
         changes another process concurrently made to a file
      
      Fix it by requiring FMODE_WRITE for these operations, just like for
      F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these
      ioctls when intending to write into the file, that seems unlikely to break
      anything.
      
      Fixes: 88b88a66 ("f2fs: support atomic writes")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4f5a100f
    • Zhiguo Niu's avatar
      f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITS · 8fb9f319
      Zhiguo Niu authored
      Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup
      Signed-off-by: default avatarZhiguo Niu <zhiguo.niu@unisoc.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8fb9f319
  3. 15 Aug, 2024 8 commits
  4. 05 Aug, 2024 5 commits
    • Chao Yu's avatar
      f2fs: atomic: fix to avoid racing w/ GC · 1a0bd289
      Chao Yu authored
      Case #1:
      SQLite App		GC Thread		Kworker		Shrinker
      - f2fs_ioc_start_atomic_write
      
      - f2fs_ioc_commit_atomic_write
       - f2fs_commit_atomic_write
        - filemap_write_and_wait_range
        : write atomic_file's data to cow_inode
      								echo 3 > drop_caches
      								to drop atomic_file's
      								cache.
      			- f2fs_gc
      			 - gc_data_segment
      			  - move_data_page
      			   - set_page_dirty
      
      						- writepages
      						 - f2fs_do_write_data_page
      						 : overwrite atomic_file's data
      						   to cow_inode
        - f2fs_down_write(&fi->i_gc_rwsem[WRITE])
        - __f2fs_commit_atomic_write
        - f2fs_up_write(&fi->i_gc_rwsem[WRITE])
      
      Case #2:
      SQLite App		GC Thread		Kworker
      - f2fs_ioc_start_atomic_write
      
      						- __writeback_single_inode
      						 - do_writepages
      						  - f2fs_write_cache_pages
      						   - f2fs_write_single_data_page
      						    - f2fs_do_write_data_page
      						    : write atomic_file's data to cow_inode
      			- f2fs_gc
      			 - gc_data_segment
      			  - move_data_page
      			   - set_page_dirty
      
      						- writepages
      						 - f2fs_do_write_data_page
      						 : overwrite atomic_file's data to cow_inode
      - f2fs_ioc_commit_atomic_write
      
      In above cases racing in between atomic_write and GC, previous
      data in atomic_file may be overwrited to cow_file, result in
      data corruption.
      
      This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private,
      and use it to indicate that there is last dirty data in atomic file,
      and the data should be writebacked into cow_file, if the flag is not
      tagged in page, we should never write data across files.
      
      Fixes: 3db1de0e ("f2fs: change the current atomic write way")
      Cc: Daeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1a0bd289
    • Julian Sun's avatar
      f2fs: fix macro definition stat_inc_cp_count · d72750e4
      Julian Sun authored
      The macro stat_inc_cp_count accepts a parameter si,
      but it was not used, rather the variable sbi was directly used,
      which may be a local variable inside a function that calls the macros.
      Signed-off-by: default avatarJulian Sun <sunjunchao2870@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d72750e4
    • Julian Sun's avatar
      f2fs: fix macro definition on_f2fs_build_free_nids · d1e1ff97
      Julian Sun authored
      The macro on_f2fs_build_free_nids accepts a parameter nmi,
      but it was not used, rather the variable nm_i was directly used,
      which may be a local variable inside a function that calls the macros.
      Signed-off-by: default avatarJulian Sun <sunjunchao2870@gmail.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d1e1ff97
    • Liao Yuanhong's avatar
      f2fs: add write priority option based on zone UFS · 8444ce52
      Liao Yuanhong authored
      Currently, we are using a mix of traditional UFS and zone UFS to support
      some functionalities that cannot be achieved on zone UFS alone. However,
      there are some issues with this approach. There exists a significant
      performance difference between traditional UFS and zone UFS. Under normal
      usage, we prioritize writes to zone UFS. However, in critical conditions
      (such as when the entire UFS is almost full), we cannot determine whether
      data will be written to traditional UFS or zone UFS. This can lead to
      significant performance fluctuations, which is not conducive to
      development and testing. To address this, we have added an option
      zlu_io_enable under sys with the following three modes:
      1) zlu_io_enable == 0:Normal mode, prioritize writing to zone UFS;
      2) zlu_io_enable == 1:Zone UFS only mode, only allow writing to zone UFS;
      3) zlu_io_enable == 2:Traditional UFS priority mode, prioritize writing to
      traditional UFS.
      Signed-off-by: default avatarLiao Yuanhong <liaoyuanhong@vivo.com>
      Signed-off-by: default avatarWu Bo <bo.wu@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8444ce52
    • Nikita Zhandarovich's avatar
      f2fs: avoid potential int overflow in sanity_check_area_boundary() · 50438dbc
      Nikita Zhandarovich authored
      While calculating the end addresses of main area and segment 0, u32
      may be not enough to hold the result without the danger of int
      overflow.
      
      Just in case, play it safe and cast one of the operands to a
      wider type (u64).
      
      Found by Linux Verification Center (linuxtesting.org) with static
      analysis tool SVACE.
      
      Fixes: fd694733 ("f2fs: cover large section in sanity check of super")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNikita Zhandarovich <n.zhandarovich@fintech.ru>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      50438dbc