1. 06 Sep, 2024 7 commits
  2. 21 Aug, 2024 10 commits
    • Christophe JAILLET's avatar
      f2fs: Use sysfs_emit_at() to simplify code · f7a678bb
      Christophe JAILLET authored
      This file already uses sysfs_emit(). So be consistent and also use
      sysfs_emit_at().
      
      This slightly simplifies the code and makes it more readable.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f7a678bb
    • Chao Yu's avatar
      f2fs: atomic: fix to forbid dio in atomic_file · b2c160f4
      Chao Yu authored
      atomic write can only be used via buffered IO, let's fail direct IO on
      atomic_file and return -EOPNOTSUPP.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b2c160f4
    • Yeongjin Gil's avatar
      f2fs: compress: don't redirty sparse cluster during {,de}compress · f785cec2
      Yeongjin Gil authored
      In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips
      writepage considering that it has been already truncated.
      This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not
      cleared during the writeback process for a compressed file including
      NULL_ADDR in compress_mode=user.
      
      This is the reproduction process:
      
      1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile
      2. f2fs_io compress testfile
      3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile
      4. f2fs_io decompress testfile
      
      To prevent the problem, let's check whether the cluster is fully
      allocated before redirty its pages.
      
      Fixes: 5fdb322f ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Reviewed-by: default avatarSunmin Jeong <s_min.jeong@samsung.com>
      Tested-by: default avatarJaewook Kim <jw5454.kim@samsung.com>
      Signed-off-by: default avatarYeongjin Gil <youngjin.gil@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f785cec2
    • Shin'ichiro Kawasaki's avatar
      f2fs: check discard support for conventional zones · 43aec4d0
      Shin'ichiro Kawasaki authored
      As the helper function f2fs_bdev_support_discard() shows, f2fs checks if
      the target block devices support discard by calling
      bdev_max_discard_sectors() and bdev_is_zoned(). This check works well
      for most cases, but it does not work for conventional zones on zoned
      block devices. F2fs assumes that zoned block devices support discard,
      and calls __submit_discard_cmd(). When __submit_discard_cmd() is called
      for sequential write required zones, it works fine since
      __submit_discard_cmd() issues zone reset commands instead of discard
      commands. However, when __submit_discard_cmd() is called for
      conventional zones, __blkdev_issue_discard() is called even when the
      devices do not support discard.
      
      The inappropriate __blkdev_issue_discard() call was not a problem before
      the commit 30f1e724 ("block: move discard checks into the ioctl
      handler") because __blkdev_issue_discard() checked if the target devices
      support discard or not. If not, it returned EOPNOTSUPP. After the
      commit, __blkdev_issue_discard() no longer checks it. It always returns
      zero and sets NULL to the given bio pointer. This NULL pointer triggers
      f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the
      commands below at the umount step, where /dev/nullb0 is a zoned null_blk
      with 5GB total size, 128MB zone size and 10 conventional zones.
      
      $ mkfs.f2fs -f -m /dev/nullb0
      $ mount /dev/nullb0 /mnt
      $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done
      $ umount /mnt
      
      To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call.
      When discard is requested for conventional zones, check if the device
      supports discard or not. If not, return EOPNOTSUPP.
      
      Fixes: 30f1e724 ("block: move discard checks into the ioctl handler")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      43aec4d0
    • Chao Yu's avatar
      f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread() · c7f114d8
      Chao Yu authored
      syzbot reports a f2fs bug as below:
      
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       print_report+0xe8/0x550 mm/kasan/report.c:491
       kasan_report+0x143/0x180 mm/kasan/report.c:601
       kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
       atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
       __refcount_add include/linux/refcount.h:184 [inline]
       __refcount_inc include/linux/refcount.h:241 [inline]
       refcount_inc include/linux/refcount.h:258 [inline]
       get_task_struct include/linux/sched/task.h:118 [inline]
       kthread_stop+0xca/0x630 kernel/kthread.c:704
       f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210
       f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283
       f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline]
       __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:907 [inline]
       __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      The root cause is below race condition, it may cause use-after-free
      issue in sbi->gc_th pointer.
      
      - remount
       - f2fs_remount
        - f2fs_stop_gc_thread
         - kfree(gc_th)
      				- f2fs_ioc_shutdown
      				 - f2fs_do_shutdown
      				  - f2fs_stop_gc_thread
      				   - kthread_stop(gc_th->f2fs_gc_task)
         : sbi->gc_thread = NULL;
      
      We will call f2fs_do_shutdown() in two paths:
      - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore
      for fixing.
      - for f2fs_shutdown() path, it's safe since caller has already grabbed
      sb->s_umount semaphore.
      
      Reported-by: syzbot+1a8e2b31f2ac9bd3d148@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000005c7ccb061e032b9b@google.com
      Fixes: 7950e9ac ("f2fs: stop gc/discard thread after fs shutdown")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c7f114d8
    • Chao Yu's avatar
      f2fs: atomic: fix to truncate pagecache before on-disk metadata truncation · ebd3309a
      Chao Yu authored
      We should always truncate pagecache while truncating on-disk data.
      
      Fixes: a46bebd5 ("f2fs: synchronize atomic write aborts")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ebd3309a
    • Chao Yu's avatar
      f2fs: fix to wait page writeback before setting gcing flag · a4d7f2b3
      Chao Yu authored
      Soft IRQ				Thread
      - f2fs_write_end_io
      					- f2fs_defragment_range
      					 - set_page_private_gcing
       - type = WB_DATA_TYPE(page, false);
       : assign type w/ F2FS_WB_CP_DATA
       due to page_private_gcing() is true
        - dec_page_count() w/ wrong type
        - end_page_writeback()
      
      Value of F2FS_WB_CP_DATA reference count may become negative under above
      race condition, the root cause is we missed to wait page writeback before
      setting gcing page private flag, let's fix it.
      
      Fixes: 2d1fe8a8 ("f2fs: fix to tag gcing flag on page during file defragment")
      Fixes: 4961acdd ("f2fs: fix to tag gcing flag on page during block migration")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a4d7f2b3
    • Yeongjin Gil's avatar
      f2fs: Create COW inode from parent dentry for atomic write · 8c1b7879
      Yeongjin Gil authored
      The i_pino in f2fs_inode_info has the previous parent's i_ino when inode
      was renamed, which may cause f2fs_ioc_start_atomic_write to fail.
      If file_wrong_pino is true and i_nlink is 1, then to find a valid pino,
      we should refer to the dentry from inode.
      
      To resolve this issue, let's get parent inode using parent dentry
      directly.
      
      Fixes: 3db1de0e ("f2fs: change the current atomic write way")
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Reviewed-by: default avatarSunmin Jeong <s_min.jeong@samsung.com>
      Signed-off-by: default avatarYeongjin Gil <youngjin.gil@samsung.com>
      Reviewed-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8c1b7879
    • Jann Horn's avatar
      f2fs: Require FMODE_WRITE for atomic write ioctls · 4f5a100f
      Jann Horn authored
      The F2FS ioctls for starting and committing atomic writes check for
      inode_owner_or_capable(), but this does not give LSMs like SELinux or
      Landlock an opportunity to deny the write access - if the caller's FSUID
      matches the inode's UID, inode_owner_or_capable() immediately returns true.
      
      There are scenarios where LSMs want to deny a process the ability to write
      particular files, even files that the FSUID of the process owns; but this
      can currently partially be bypassed using atomic write ioctls in two ways:
      
       - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can
         truncate an inode to size 0
       - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert
         changes another process concurrently made to a file
      
      Fix it by requiring FMODE_WRITE for these operations, just like for
      F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these
      ioctls when intending to write into the file, that seems unlikely to break
      anything.
      
      Fixes: 88b88a66 ("f2fs: support atomic writes")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4f5a100f
    • Zhiguo Niu's avatar
      f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITS · 8fb9f319
      Zhiguo Niu authored
      Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup
      Signed-off-by: default avatarZhiguo Niu <zhiguo.niu@unisoc.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8fb9f319
  3. 15 Aug, 2024 8 commits
  4. 05 Aug, 2024 9 commits
  5. 04 Aug, 2024 6 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc2 · de9c2c66
      Linus Torvalds authored
      de9c2c66
    • Tetsuo Handa's avatar
      profiling: remove profile=sleep support · b88f5538
      Tetsuo Handa authored
      The kernel sleep profile is no longer working due to a recursive locking
      bug introduced by commit 42a20f86 ("sched: Add wrapper for get_wchan()
      to keep task blocked")
      
      Booting with the 'profile=sleep' kernel command line option added or
      executing
      
        # echo -n sleep > /sys/kernel/profiling
      
      after boot causes the system to lock up.
      
      Lockdep reports
      
        kthreadd/3 is trying to acquire lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
      
        but task is already holding lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
      
      with the call trace being
      
         lock_acquire+0xc8/0x2f0
         get_wchan+0x32/0x70
         __update_stats_enqueue_sleeper+0x151/0x430
         enqueue_entity+0x4b0/0x520
         enqueue_task_fair+0x92/0x6b0
         ttwu_do_activate+0x73/0x140
         try_to_wake_up+0x213/0x370
         swake_up_locked+0x20/0x50
         complete+0x2f/0x40
         kthread+0xfb/0x180
      
      However, since nobody noticed this regression for more than two years,
      let's remove 'profile=sleep' support based on the assumption that nobody
      needs this functionality.
      
      Fixes: 42a20f86 ("sched: Add wrapper for get_wchan() to keep task blocked")
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b88f5538
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5dbd76a
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
      
         A recent change in the ACPI code which consolidated code pathes moved
         the invocation of init_freq_invariance_cppc() to be moved to a CPU
         hotplug handler. The first invocation on AMD CPUs ends up enabling a
         static branch which dead locks because the static branch enable tries
         to acquire cpu_hotplug_lock but that lock is already held write by
         the hotplug machinery.
      
         Use static_branch_enable_cpuslocked() instead and take the hotplug
         lock read for the Intel code path which is invoked from the
         architecture code outside of the CPU hotplug operations.
      
       - Fix the number of reserved bits in the sev_config structure bit field
         so that the bitfield does not exceed 64 bit.
      
       - Add missing Zen5 model numbers
      
       - Fix the alignment assumptions of pti_clone_pgtable() and
         clone_entry_text() on 32-bit:
      
         The code assumes PMD aligned code sections, but on 32-bit the kernel
         entry text is not PMD aligned. So depending on the code size and
         location, which is configuration and compiler dependent, entry text
         can cross a PMD boundary. As the start is not PMD aligned adding PMD
         size to the start address is larger than the end address which
         results in partially mapped entry code for user space. That causes
         endless recursion on the first entry from userspace (usually #PF).
      
         Cure this by aligning the start address in the addition so it ends up
         at the next PMD start address.
      
         clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
         eventually be PTE mapped, which causes a map fail because the PMD for
         the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
         the clone() invocation which resolves to PTE on 32-bit and PMD on
         64-bit.
      
       - Zero the 8-byte case for get_user() on range check failure on 32-bit
      
         The recend consolidation of the 8-byte get_user() case broke the
         zeroing in the failure case again. Establish it by clearing ECX
         before the range check and not afterwards as that obvioulsy can't be
         reached when the range check fails
      
      * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
        x86/mm: Fix pti_clone_entry_text() for i386
        x86/mm: Fix pti_clone_pgtable() alignment assumption
        x86/setup: Parse the builtin command line before merging
        x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
        x86/sev: Fix __reserved field in sev_config
        x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
      a5dbd76a
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61ca6c78
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two fixes for the timer/clocksource code:
      
         - The recent fix to make the take over of the broadcast timer more
           reliable retrieves a per CPU pointer in preemptible context.
      
           This went unnoticed in testing as some compilers hoist the access
           into the non-preemotible section where the pointer is actually
           used, but obviously compilers can rightfully invoke it where the
           code put it.
      
           Move it into the non-preemptible section right to the actual usage
           side to cure it.
      
         - The clocksource watchdog is supposed to emit a warning when the
           retry count is greater than one and the number of retries reaches
           the limit.
      
           The condition is backwards and warns always when the count is
           greater than one. Fixup the condition to prevent spamming dmesg"
      
      * tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
        tick/broadcast: Move per CPU pointer access into the atomic section
      61ca6c78
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6cc82dc2
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
      
       - When stime is larger than rtime due to accounting imprecision, then
         utime = rtime - stime becomes negative. As this is unsigned math, the
         result becomes a huge positive number.
      
         Cure it by resetting stime to rtime in that case, so utime becomes 0.
      
       - Restore consistent state when sched_cpu_deactivate() fails.
      
         When offlining a CPU fails in sched_cpu_deactivate() after the SMT
         present counter has been decremented, then the function aborts but
         fails to increment the SMT present counter and leaves it imbalanced.
         Consecutive operations cause it to underflow. Add the missing fixup
         for the error path.
      
         For SMT accounting the runqueue needs to marked online again in the
         error exit path to restore consistent state.
      
      * tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
        sched/core: Introduce sched_set_rq_on/offline() helper
        sched/smt: Fix unbalance sched_smt_present dec/inc
        sched/smt: Introduce sched_smt_present_inc/dec() helper
        sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
      6cc82dc2
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ddeb0ef
      Linus Torvalds authored
      Pull x86 perf fixes from Thomas Gleixner:
      
       - Move the smp_processor_id() invocation back into the non-preemtible
         region, so that the result is valid to use
      
       - Add the missing package C2 residency counters for Sierra Forest CPUs
         to make the newly added support actually useful
      
      * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix smp_processor_id()-in-preemptible warnings
        perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
      1ddeb0ef