1. 13 Mar, 2019 20 commits
    • Chao Yu's avatar
      f2fs: fix to adapt small inline xattr space in __find_inline_xattr() · 2c28aba8
      Chao Yu authored
      With below testcase, we will fail to find existed xattr entry:
      
      1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
      2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
      3. touch /mnt/f2fs/file
      4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
      5. getfattr -n "user.name" /mnt/f2fs/file
      
      /mnt/f2fs/file: user.name: No such attribute
      
      The reason is for inode which has very small inline xattr size,
      __find_inline_xattr() will fail to traverse any entry due to first
      entry may not be loaded from xattr node yet, later, we may skip to
      check entire xattr datas in __find_xattr(), result in such wrong
      condition.
      
      This patch adds condition to check such case to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2c28aba8
    • Chao Yu's avatar
      f2fs: fix to do sanity check with inode.i_inline_xattr_size · dd6c89b5
      Chao Yu authored
      As Paul Bandha reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202709
      
      When I run the poc on the mounted f2fs img I get a buffer overflow in
      read_inline_xattr due to there being no sanity check on the value of
      i_inline_xattr_size.
      
      I created the img by just modifying the value of i_inline_xattr_size
      in the inode:
      
      i_name                        		[test1.txt]
      i_ext: fofs:0 blkaddr:0 len:0
      i_extra_isize                 		[0x      18 : 24]
      i_inline_xattr_size           		[0x    ffff : 65535]
      i_addr[ofs]                   		[0x       0 : 0]
      
      mkdir /mnt/f2fs
      mount ./f2fs1.img /mnt/f2fs
      gcc poc.c -o poc
      ./poc
      
      int main() {
      	int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
      	printf("ret %d", y);
      	printf("errno: %d\n", errno);
      
      }
      
       BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
       Read of size 262140 at addr ffff88011035efd8 by task f2fs1poc/3263
      
       CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
       Call Trace:
        dump_stack+0x71/0xab
        print_address_description+0x83/0x250
        kasan_report+0x213/0x350
        memcpy+0x1f/0x50
        read_inline_xattr+0x18f/0x260
        read_all_xattrs+0xba/0x190
        f2fs_listxattr+0x9d/0x3f0
        listxattr+0xb2/0xd0
        path_listxattr+0x93/0xe0
        do_syscall_64+0x9d/0x220
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
      to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dd6c89b5
    • Jaegeuk Kim's avatar
      f2fs: give some messages for inline_xattr_size · 70db5b04
      Jaegeuk Kim authored
      This patch adds some kernel messages when user sets wrong inline_xattr_size.
      
      Fixes: 500e0b28 ("f2fs: fix to check inline_xattr_size boundary correctly")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      70db5b04
    • Chao Yu's avatar
      f2fs: don't trigger read IO for beyond EOF page · 86109c90
      Chao Yu authored
      In f2fs_mpage_readpages(), if page is beyond EOF, we should just
      zero out it, but previously, before checking previous mapping
      info, we missed to check filesize boundary, fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      86109c90
    • Chao Yu's avatar
      f2fs: fix to add refcount once page is tagged PG_private · 240a5915
      Chao Yu authored
      As Gao Xiang reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202749
      
      f2fs may skip pageout() due to incorrect page reference count.
      
      The problem here is that MM defined the rule [1] very clearly that
      once page was set with PG_private flag, we should increment the
      refcount in that page, also main flows like pageout(), migrate_page()
      will assume there is one additional page reference count if
      page_has_private() returns true.
      
      But currently, f2fs won't add/del refcount when changing PG_private
      flag. Anyway, f2fs should follow MM's rule to make MM's related flows
      running as expected.
      
      [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/Reported-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      240a5915
    • Chao Yu's avatar
      f2fs: remove wrong comment in f2fs_invalidate_page() · 25720cc0
      Chao Yu authored
      Since 8c242db9 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"),
      we've started to not skip clear private flag for atomic_write page
      truncation, so removing old wrong comment in f2fs_invalidate_page().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      25720cc0
    • Chao Yu's avatar
      f2fs: fix to use kvfree instead of kzfree · 2a6a7e72
      Chao Yu authored
      As Jiqun Li reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202747
      
      System can panic due to using wrong allocate/free function pair
      in xattr interface:
      - use kvmalloc to allocate memory
      - use kzfree to free memory
      
      Let's fix to use kvfree instead of kzfree, BTW, we are safe to
      get rid of kzfree, since there is no such confidential data stored
      as xattr, we don't need to zero it before free memory.
      
      Fixes: 5222595d ("f2fs: use kvmalloc, if kmalloc is failed")
      Reported-by: default avatarJiqun Li <jiqun.li@unisoc.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2a6a7e72
    • Chao Yu's avatar
      f2fs: print more parameters in trace_f2fs_map_blocks · 76630f20
      Chao Yu authored
      for better map_blocks trace.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      76630f20
    • Chao Yu's avatar
      f2fs: trace f2fs_ioc_shutdown · 559e87c4
      Chao Yu authored
      This patch supports to trace f2fs_ioc_shutdown.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      559e87c4
    • Chao Yu's avatar
      f2fs: fix to avoid deadlock of atomic file operations · 48432984
      Chao Yu authored
      Thread A				Thread B
      - __fput
       - f2fs_release_file
        - drop_inmem_pages
         - mutex_lock(&fi->inmem_lock)
         - __revoke_inmem_pages
          - lock_page(page)
      					- open
      					- f2fs_setattr
      					- truncate_setsize
      					 - truncate_inode_pages_range
      					  - lock_page(page)
      					  - truncate_cleanup_page
      					   - f2fs_invalidate_page
      					    - drop_inmem_page
      					    - mutex_lock(&fi->inmem_lock);
      
      We may encounter above ABBA deadlock as reported by Kyungtae Kim:
      
      I'm reporting a bug in linux-4.17.19: "INFO: task hung in
      drop_inmem_page" (no reproducer)
      
      I think this might be somehow related to the following:
      https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
      
      =========================================
      INFO: task syz-executor7:10822 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D27024 10822   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
       __mutex_lock_common kernel/locking/mutex.c:833 [inline]
       __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
       mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
       drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
       f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
       do_invalidatepage mm/truncate.c:165 [inline]
       truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
       truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
       truncate_inode_pages mm/truncate.c:478 [inline]
       truncate_pagecache+0x6d/0x90 mm/truncate.c:801
       truncate_setsize+0x81/0xa0 mm/truncate.c:826
       f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
       notify_change+0xa62/0xe80 fs/attr.c:313
       do_truncate+0x12e/0x1e0 fs/open.c:63
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
      INFO: task syz-executor7:10858 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D28880 10858   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
       rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
       call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
       __down_write arch/x86/include/asm/rwsem.h:142 [inline]
       down_write+0x58/0xa0 kernel/locking/rwsem.c:72
       inode_lock include/linux/fs.h:713 [inline]
       do_truncate+0x120/0x1e0 fs/open.c:61
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
      INFO: task syz-executor5:10829 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor5   D28760 10829   6308 0x80000002
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       io_schedule+0x21/0x80 kernel/sched/core.c:5179
       wait_on_page_bit_common mm/filemap.c:1100 [inline]
       __lock_page+0x2b5/0x390 mm/filemap.c:1273
       lock_page include/linux/pagemap.h:483 [inline]
       __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
       drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
       f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
       __fput+0x2c7/0x780 fs/file_table.c:209
       ____fput+0x1a/0x20 fs/file_table.c:243
       task_work_run+0x151/0x1d0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x8ba/0x30a0 kernel/exit.c:865
       do_group_exit+0x13b/0x3a0 kernel/exit.c:968
       get_signal+0x6bb/0x1650 kernel/signal.c:2482
       do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
       exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
      RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700
      
      This patch tries to use trylock_page to mitigate such deadlock condition
      for fix.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      48432984
    • Chao Yu's avatar
      f2fs: fix to dirty inode for i_mode recovery · ca597bdd
      Chao Yu authored
      As Seulbae Kim reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202637
      
      We didn't recover permission field correctly after sudden power-cut,
      the reason is in setattr we didn't add inode into global dirty list
      once i_mode is changed, so latter checkpoint triggered by fsync will
      not flush last i_mode into disk, result in this problem, fix it.
      Reported-by: default avatarSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ca597bdd
    • Jaegeuk Kim's avatar
      f2fs: give random value to i_generation · 428e3bcf
      Jaegeuk Kim authored
      This follows to give random number to i_generation along with commit
      23253068 ("ext4: improve smp scalability for inode generation")
      
      This can be used for DUN for UFS HW encryption.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      428e3bcf
    • Gao Xiang's avatar
      f2fs: no need to take page lock in readdir · 613f3dcd
      Gao Xiang authored
      VFS will take inode_lock for readdir, therefore no need to
      take page lock in readdir at all just as the majority of
      other generic filesystems.
      
      This patch improves concurrency since .iterate_shared
      was introduced to VFS years ago.
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      613f3dcd
    • Chao Yu's avatar
      f2fs: fix to update iostat correctly in IPU path · e46f6bd8
      Chao Yu authored
      In error path of IPU, we didn't account iostat correctly, fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e46f6bd8
    • Chao Yu's avatar
      f2fs: fix encrypted page memory leak · 6492a335
      Chao Yu authored
      For IPU path of f2fs_do_write_data_page(), in its error path, we
      need to release encrypted page and fscrypt context, otherwise it
      will cause memory leak.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6492a335
    • Chao Yu's avatar
      f2fs: make fault injection covering __submit_flush_wait() · dc37910d
      Chao Yu authored
      This patch changes to allow failure of f2fs_bio_alloc() in
      __submit_flush_wait(), which can simulate flush error in checkpoint()
      for covering more error paths.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dc37910d
    • Chao Yu's avatar
      f2fs: fix to retry fill_super only if recovery failed · aa2c8c43
      Chao Yu authored
      With current retry mechanism in f2fs_fill_super, first fill_super
      fails due to no memory, then second fill_super runs w/o recovery,
      if we succeed, we may lose fsynced data, it doesn't make sense.
      
      Let's retry fill_super only if it occurs non-ENOMEM error during
      recovery.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aa2c8c43
    • Gao Xiang's avatar
      f2fs: silence VM_WARN_ON_ONCE in mempool_alloc · bc73a4b2
      Gao Xiang authored
      Note that __GFP_ZERO is not supported for mempool_alloc,
      which also documented in the mempool_alloc comments.
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bc73a4b2
    • Zeng Guangyue's avatar
      f2fs: correct spelling mistake · 68b79cdc
      Zeng Guangyue authored
      correct spelling mistake for "nunmber"
      Signed-off-by: default avatarZeng Guangyue <zengguangyue@hisilicon.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      68b79cdc
    • Jaegeuk Kim's avatar
      f2fs: fix wrong #endif · 0af725fc
      Jaegeuk Kim authored
      We have to cover whole headerfile with last #endif.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0af725fc
  2. 06 Mar, 2019 8 commits
    • Jaegeuk Kim's avatar
      f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG · fb40d618
      Jaegeuk Kim authored
      If we met this once, let fsck.f2fs clear this only.
      Note that, this addresses all the subtle fault injection test.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fb40d618
    • Chao Yu's avatar
      f2fs: don't allow negative ->write_io_size_bits · 6d52e135
      Chao Yu authored
      As Dan reported:
      
      "We put an upper bound on ->write_io_size_bits but we don't have a lower
      bound."
      
      So let's add lower bound check for ->write_io_size_bits in parse_options().
      
      [We don't allow configuring ->write_io_size_bits to zero, since at least
      we need to fill one dummy page for aligned IO.]
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6d52e135
    • Chao Yu's avatar
      f2fs: fix to check inline_xattr_size boundary correctly · 500e0b28
      Chao Yu authored
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      500e0b28
    • Sahitya Tummala's avatar
      f2fs: do not use mutex lock in atomic context · 9083977d
      Sahitya Tummala authored
      Fix below warning coming because of using mutex lock in atomic context.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
      in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
      Preemption disabled at: __radix_tree_preload+0x28/0x130
      Call trace:
       dump_backtrace+0x0/0x2b4
       show_stack+0x20/0x28
       dump_stack+0xa8/0xe0
       ___might_sleep+0x144/0x194
       __might_sleep+0x58/0x8c
       mutex_lock+0x2c/0x48
       f2fs_trace_pid+0x88/0x14c
       f2fs_set_node_page_dirty+0xd0/0x184
      
      Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
      spin_lock() acquired.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9083977d
    • Chao Yu's avatar
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu authored
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce
    • Chengguang Xu's avatar
      f2fs: jump to label 'free_node_inode' when failing from d_make_root() · 025cdb16
      Chengguang Xu authored
      When sb->s_root is NULL dput() will do nothing,
      so jump to label 'free_node_inode' instead of lable
      'free_root_inode' when failing from d_make_root().
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      025cdb16
    • Chao Yu's avatar
      f2fs: fix to document inline_xattr_size option · 7321dd97
      Chao Yu authored
      We missed to add document for inline_xattr_size mount option in f2fs.txt,
      add it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7321dd97
    • zhengliang's avatar
      f2fs: fix to data block override node segment by mistake · a0770e13
      zhengliang authored
      v4: Rearrange the previous three versions.
      
      The following scenario could lead to data block override by mistake.
      
      TASK A            |  TASK kworker                                            |     TASK B                                            |       TASK C
                        |                                                          |                                                       |
      open              |                                                          |                                                       |
      write             |                                                          |                                                       |
      close             |                                                          |                                                       |
                        |  f2fs_write_data_pages                                   |                                                       |
                        |    f2fs_write_cache_pages                                |                                                       |
                        |      f2fs_outplace_write_data                            |                                                       |
                        |        f2fs_allocate_data_block (get block in seg S,     |                                                       |
                        |                                  S is full, and only     |                                                       |
                        |                                  have this valid data    |                                                       |
                        |                                  block)                  |                                                       |
                        |          allocate_segment                                |                                                       |
                        |          locate_dirty_segment (mark S as PRE)            |                                                       |
                        |        f2fs_submit_page_write (submit but is not         |                                                       |
                        |                                written on dev)           |                                                       |
      unlink            |                                                          |                                                       |
       iput_final       |                                                          |                                                       |
        f2fs_drop_inode |                                                          |                                                       |
          f2fs_truncate |                                                          |                                                       |
       (not evict)      |                                                          |                                                       |
                        |                                                          | write_checkpoint                                      |
                        |                                                          |  flush merged bio but not wait file data writeback    |
                        |                                                          |  set_prefree_as_free (mark S as FREE)                 |
                        |                                                          |                                                       | update NODE/DATA
                        |                                                          |                                                       | allocate_segment (select S)
                        |     writeback done                                       |                                                       |
      
      So we need to guarantee io complete before truncate inode in f2fs_drop_inode.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarZheng Liang <zhengliang6@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a0770e13
  3. 16 Feb, 2019 8 commits
  4. 04 Feb, 2019 1 commit
  5. 22 Jan, 2019 3 commits
    • Chao Yu's avatar
      f2fs: fix to set sbi dirty correctly · 20109873
      Chao Yu authored
      In order to record direct IO count, we add two additional type in
      enum count_type: F2FS_DIO_{WRITE,READ}, but those IO won't dirty
      filesystem metadata, so we don't need to set filesystem dirty in
      inc_page_count(), fix it.
      
      Fixes: 02b16d0a ("f2fs: add to account direct IO")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      20109873
    • Chao Yu's avatar
      f2fs: fix to initialize variable to avoid UBSAN/smatch warning · f9aa52a8
      Chao Yu authored
      As Dan Carpenter as below:
      
      The patch df634f444ee9: "f2fs: use rb_*_cached friends" from Oct 4,
      2018, leads to the following static checker warning:
      
      	fs/f2fs/extent_cache.c:606 f2fs_update_extent_tree_range()
      	error: uninitialized symbol 'leftmost'.
      
      And also Eric Biggers, and Kyungtae Kim reported, there is an UBSAN
      warning described as below:
      
      We report a bug in linux-4.20.2: "UBSAN: Undefined behaviour in
      fs/f2fs/extent_cache.c"
      
      kernel config: https://kt0755.github.io/etc/config_v4.20_stable
      repro: https://kt0755.github.io/etc/repro.4a3e7.c (f2fs is mounted on
      /mnt/f2fs/)
      
      This arose in f2fs_update_extent_tree_range (fs/f2fs/extent_cache.c:605).
      It seems that, for some reason, its last argument became "24"
      although that was supposed to be bool type.
      
      =========================================
      UBSAN: Undefined behaviour in fs/f2fs/extent_cache.c:605:4
      load of value 24 is not a valid value for type '_Bool'
      CPU: 0 PID: 6774 Comm: syz-executor5 Not tainted 4.20.2 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xb1/0x118 lib/dump_stack.c:113
       ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
       __ubsan_handle_load_invalid_value+0x17a/0x1be lib/ubsan.c:457
       f2fs_update_extent_tree_range+0x1d4a/0x1d50 fs/f2fs/extent_cache.c:605
       f2fs_update_extent_cache+0x2b6/0x350 fs/f2fs/extent_cache.c:804
       f2fs_update_data_blkaddr+0x61/0x70 fs/f2fs/data.c:656
       f2fs_outplace_write_data+0x1d6/0x4b0 fs/f2fs/segment.c:3140
       f2fs_convert_inline_page+0x86d/0x2060 fs/f2fs/inline.c:163
       f2fs_convert_inline_inode+0x6b5/0xad0 fs/f2fs/inline.c:208
       f2fs_preallocate_blocks+0x78b/0xb00 fs/f2fs/data.c:982
       f2fs_file_write_iter+0x31b/0xf40 fs/f2fs/file.c:3062
       call_write_iter include/linux/fs.h:1857 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x538/0x6e0 fs/read_write.c:487
       vfs_write+0x1b3/0x520 fs/read_write.c:549
       ksys_write+0xde/0x1c0 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x7e/0xc0 fs/read_write.c:607
       do_syscall_64+0xbe/0x4f0 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
      89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
      01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1ea15edc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f1ea15ee6cc RCX: 00000000004497b9
      RDX: 0000000000001000 RSI: 0000000020000140 RDI: 0000000000000013
      RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000bb50 R14: 00000000006f4bf0 R15: 00007f1ea15ee700
      =========================================
      
      As I checked, this uninitialized variable won't cause extent cache
      corruption, but in order to avoid such kind of warning of both UBSAN
      and smatch, fix to initialize related variable.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reported-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: default avatarKyungtae Kim <kt0755@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f9aa52a8
    • Sheng Yong's avatar
      f2fs: UBSAN: set boolean value iostat_enable correctly · ac929858
      Sheng Yong authored
      When setting /sys/fs/f2fs/<DEV>/iostat_enable with non-bool value, UBSAN
      reports the following warning.
      
      [ 7562.295484] ================================================================================
      [ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
      [ 7562.297651] load of value 64 is not a valid value for type '_Bool'
      [ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
      [ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [ 7562.298662] Call Trace:
      [ 7562.298760]  dump_stack+0x46/0x5b
      [ 7562.298811]  ubsan_epilogue+0x9/0x40
      [ 7562.298830]  __ubsan_handle_load_invalid_value+0x72/0x90
      [ 7562.298863]  f2fs_file_write_iter+0x29f/0x3f0
      [ 7562.298905]  __vfs_write+0x115/0x160
      [ 7562.298922]  vfs_write+0xa7/0x190
      [ 7562.298934]  ksys_write+0x50/0xc0
      [ 7562.298973]  do_syscall_64+0x4a/0xe0
      [ 7562.298992]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 7562.299001] RIP: 0033:0x7fa45ec19c00
      [ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
      [ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00
      [ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001
      [ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000
      [ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400
      [ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000
      [ 7562.299091] ================================================================================
      
      So, if iostat_enable is enabled, set its value as true.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ac929858