1. 14 Mar, 2019 1 commit
  2. 13 Mar, 2019 21 commits
    • Chao Yu's avatar
      f2fs: fix to avoid deadlock in f2fs_read_inline_dir() · aadcef64
      Chao Yu authored
      As Jiqun Li reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202883
      
      sometimes, dead lock when make system call SYS_getdents64 with fsync() is
      called by another process.
      
      monkey running on android9.0
      
      1.  task 9785 held sbi->cp_rwsem and waiting lock_page()
      2.  task 10349 held mm_sem and waiting sbi->cp_rwsem
      3. task 9709 held lock_page() and waiting mm_sem
      
      so this is a dead lock scenario.
      
      task stack is show by crash tools as following
      
      crash_arm64> bt ffffffc03c354080
      PID: 9785   TASK: ffffffc03c354080  CPU: 1   COMMAND: "RxIoScheduler-3"
      >> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8
      
      crash-arm64> bt 10349
      PID: 10349  TASK: ffffffc018b83080  CPU: 1   COMMAND: "BUGLY_ASYNC_UPL"
      >> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
           PC: 00000033  LR: 00000000  SP: 00000000  PSTATE: ffffffffffffffff
      
      crash-arm64> bt 9709
      PID: 9709   TASK: ffffffc03e7f3080  CPU: 1   COMMAND: "IntentService[A"
      >> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
      >> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
           PC: ffffff8008274114  [compat_filldir64+120]
           LR: ffffff80083584d4  [f2fs_fill_dentries+448]
           SP: ffffffc001e67b80  PSTATE: 80400145
          X29: ffffffc001e67b80  X28: 0000000000000000  X27: 000000000000001a
          X26: 00000000000093d7  X25: ffffffc070d52480  X24: 0000000000000008
          X23: 0000000000000028  X22: 00000000d43dfd60  X21: ffffffc001e67e90
          X20: 0000000000000011  X19: ffffff80093a4000  X18: 0000000000000000
          X17: 0000000000000000  X16: 0000000000000000  X15: 0000000000000000
          X14: ffffffffffffffff  X13: 0000000000000008  X12: 0101010101010101
          X11: 7f7f7f7f7f7f7f7f  X10: 6a6a6a6a6a6a6a6a   X9: 7f7f7f7f7f7f7f7f
           X8: 0000000080808000   X7: ffffff800827409c   X6: 0000000080808000
           X5: 0000000000000008   X4: 00000000000093d7   X3: 000000000000001a
           X2: 0000000000000011   X1: ffffffc070d52480   X0: 0000000000800238
      >> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
           PC: 0000003c  LR: 00000000  SP: 00000000  PSTATE: 000000d9
          X12: f48a02ff X11: d4678960 X10: d43dfc00  X9: d4678ae4
           X8: 00000058  X7: d4678994  X6: d43de800  X5: 000000d9
           X4: d43dfc0c  X3: d43dfc10  X2: d46799c8  X1: 00000000
           X0: 00001068
      
      Below potential deadlock will happen between three threads:
      Thread A		Thread B		Thread C
      - f2fs_do_sync_file
       - f2fs_write_checkpoint
        - down_write(&sbi->node_change) -- 1)
      			- do_page_fault
      			 - down_write(&mm->mmap_sem) -- 2)
      			  - do_wp_page
      			   - f2fs_vm_page_mkwrite
      						- getdents64
      						 - f2fs_read_inline_dir
      						  - lock_page -- 3)
        - f2fs_sync_node_pages
         - lock_page -- 3)
      			    - __do_map_lock
      			     - down_read(&sbi->node_change) -- 1)
      						  - f2fs_fill_dentries
      						   - dir_emit
      						    - compat_filldir64
      						     - do_page_fault
      						      - down_read(&mm->mmap_sem) -- 2)
      
      Since f2fs_readdir is protected by inode.i_rwsem, there should not be
      any updates in inode page, we're safe to lookup dents in inode page
      without its lock held, so taking off the lock to improve concurrency
      of readdir and avoid potential deadlock.
      Reported-by: default avatarJiqun Li <jiqun.li@unisoc.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aadcef64
    • Chao Yu's avatar
      f2fs: fix to adapt small inline xattr space in __find_inline_xattr() · 2c28aba8
      Chao Yu authored
      With below testcase, we will fail to find existed xattr entry:
      
      1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
      2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
      3. touch /mnt/f2fs/file
      4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
      5. getfattr -n "user.name" /mnt/f2fs/file
      
      /mnt/f2fs/file: user.name: No such attribute
      
      The reason is for inode which has very small inline xattr size,
      __find_inline_xattr() will fail to traverse any entry due to first
      entry may not be loaded from xattr node yet, later, we may skip to
      check entire xattr datas in __find_xattr(), result in such wrong
      condition.
      
      This patch adds condition to check such case to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2c28aba8
    • Chao Yu's avatar
      f2fs: fix to do sanity check with inode.i_inline_xattr_size · dd6c89b5
      Chao Yu authored
      As Paul Bandha reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202709
      
      When I run the poc on the mounted f2fs img I get a buffer overflow in
      read_inline_xattr due to there being no sanity check on the value of
      i_inline_xattr_size.
      
      I created the img by just modifying the value of i_inline_xattr_size
      in the inode:
      
      i_name                        		[test1.txt]
      i_ext: fofs:0 blkaddr:0 len:0
      i_extra_isize                 		[0x      18 : 24]
      i_inline_xattr_size           		[0x    ffff : 65535]
      i_addr[ofs]                   		[0x       0 : 0]
      
      mkdir /mnt/f2fs
      mount ./f2fs1.img /mnt/f2fs
      gcc poc.c -o poc
      ./poc
      
      int main() {
      	int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
      	printf("ret %d", y);
      	printf("errno: %d\n", errno);
      
      }
      
       BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
       Read of size 262140 at addr ffff88011035efd8 by task f2fs1poc/3263
      
       CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
       Call Trace:
        dump_stack+0x71/0xab
        print_address_description+0x83/0x250
        kasan_report+0x213/0x350
        memcpy+0x1f/0x50
        read_inline_xattr+0x18f/0x260
        read_all_xattrs+0xba/0x190
        f2fs_listxattr+0x9d/0x3f0
        listxattr+0xb2/0xd0
        path_listxattr+0x93/0xe0
        do_syscall_64+0x9d/0x220
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
      to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dd6c89b5
    • Jaegeuk Kim's avatar
      f2fs: give some messages for inline_xattr_size · 70db5b04
      Jaegeuk Kim authored
      This patch adds some kernel messages when user sets wrong inline_xattr_size.
      
      Fixes: 500e0b28 ("f2fs: fix to check inline_xattr_size boundary correctly")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      70db5b04
    • Chao Yu's avatar
      f2fs: don't trigger read IO for beyond EOF page · 86109c90
      Chao Yu authored
      In f2fs_mpage_readpages(), if page is beyond EOF, we should just
      zero out it, but previously, before checking previous mapping
      info, we missed to check filesize boundary, fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      86109c90
    • Chao Yu's avatar
      f2fs: fix to add refcount once page is tagged PG_private · 240a5915
      Chao Yu authored
      As Gao Xiang reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202749
      
      f2fs may skip pageout() due to incorrect page reference count.
      
      The problem here is that MM defined the rule [1] very clearly that
      once page was set with PG_private flag, we should increment the
      refcount in that page, also main flows like pageout(), migrate_page()
      will assume there is one additional page reference count if
      page_has_private() returns true.
      
      But currently, f2fs won't add/del refcount when changing PG_private
      flag. Anyway, f2fs should follow MM's rule to make MM's related flows
      running as expected.
      
      [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/Reported-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      240a5915
    • Chao Yu's avatar
      f2fs: remove wrong comment in f2fs_invalidate_page() · 25720cc0
      Chao Yu authored
      Since 8c242db9 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"),
      we've started to not skip clear private flag for atomic_write page
      truncation, so removing old wrong comment in f2fs_invalidate_page().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      25720cc0
    • Chao Yu's avatar
      f2fs: fix to use kvfree instead of kzfree · 2a6a7e72
      Chao Yu authored
      As Jiqun Li reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202747
      
      System can panic due to using wrong allocate/free function pair
      in xattr interface:
      - use kvmalloc to allocate memory
      - use kzfree to free memory
      
      Let's fix to use kvfree instead of kzfree, BTW, we are safe to
      get rid of kzfree, since there is no such confidential data stored
      as xattr, we don't need to zero it before free memory.
      
      Fixes: 5222595d ("f2fs: use kvmalloc, if kmalloc is failed")
      Reported-by: default avatarJiqun Li <jiqun.li@unisoc.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2a6a7e72
    • Chao Yu's avatar
      f2fs: print more parameters in trace_f2fs_map_blocks · 76630f20
      Chao Yu authored
      for better map_blocks trace.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      76630f20
    • Chao Yu's avatar
      f2fs: trace f2fs_ioc_shutdown · 559e87c4
      Chao Yu authored
      This patch supports to trace f2fs_ioc_shutdown.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      559e87c4
    • Chao Yu's avatar
      f2fs: fix to avoid deadlock of atomic file operations · 48432984
      Chao Yu authored
      Thread A				Thread B
      - __fput
       - f2fs_release_file
        - drop_inmem_pages
         - mutex_lock(&fi->inmem_lock)
         - __revoke_inmem_pages
          - lock_page(page)
      					- open
      					- f2fs_setattr
      					- truncate_setsize
      					 - truncate_inode_pages_range
      					  - lock_page(page)
      					  - truncate_cleanup_page
      					   - f2fs_invalidate_page
      					    - drop_inmem_page
      					    - mutex_lock(&fi->inmem_lock);
      
      We may encounter above ABBA deadlock as reported by Kyungtae Kim:
      
      I'm reporting a bug in linux-4.17.19: "INFO: task hung in
      drop_inmem_page" (no reproducer)
      
      I think this might be somehow related to the following:
      https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
      
      =========================================
      INFO: task syz-executor7:10822 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D27024 10822   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
       __mutex_lock_common kernel/locking/mutex.c:833 [inline]
       __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
       mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
       drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
       f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
       do_invalidatepage mm/truncate.c:165 [inline]
       truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
       truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
       truncate_inode_pages mm/truncate.c:478 [inline]
       truncate_pagecache+0x6d/0x90 mm/truncate.c:801
       truncate_setsize+0x81/0xa0 mm/truncate.c:826
       f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
       notify_change+0xa62/0xe80 fs/attr.c:313
       do_truncate+0x12e/0x1e0 fs/open.c:63
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
      INFO: task syz-executor7:10858 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor7   D28880 10858   6346 0x00000004
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
       rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
       call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
       __down_write arch/x86/include/asm/rwsem.h:142 [inline]
       down_write+0x58/0xa0 kernel/locking/rwsem.c:72
       inode_lock include/linux/fs.h:713 [inline]
       do_truncate+0x120/0x1e0 fs/open.c:61
       do_last fs/namei.c:2955 [inline]
       path_openat+0x2042/0x29f0 fs/namei.c:3505
       do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
       do_sys_open+0x35e/0x4e0 fs/open.c:1101
       __do_sys_open fs/open.c:1119 [inline]
       __se_sys_open fs/open.c:1114 [inline]
       __x64_sys_open+0x89/0xc0 fs/open.c:1114
       do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
      RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
      RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
      INFO: task syz-executor5:10829 blocked for more than 120 seconds.
            Not tainted 4.17.19 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor5   D28760 10829   6308 0x80000002
      Call Trace:
       context_switch kernel/sched/core.c:2867 [inline]
       __schedule+0x721/0x1e60 kernel/sched/core.c:3515
       schedule+0x88/0x1c0 kernel/sched/core.c:3559
       io_schedule+0x21/0x80 kernel/sched/core.c:5179
       wait_on_page_bit_common mm/filemap.c:1100 [inline]
       __lock_page+0x2b5/0x390 mm/filemap.c:1273
       lock_page include/linux/pagemap.h:483 [inline]
       __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
       drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
       f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
       __fput+0x2c7/0x780 fs/file_table.c:209
       ____fput+0x1a/0x20 fs/file_table.c:243
       task_work_run+0x151/0x1d0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x8ba/0x30a0 kernel/exit.c:865
       do_group_exit+0x13b/0x3a0 kernel/exit.c:968
       get_signal+0x6bb/0x1650 kernel/signal.c:2482
       do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
       exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4497b9
      RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
      RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700
      
      This patch tries to use trylock_page to mitigate such deadlock condition
      for fix.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      48432984
    • Chao Yu's avatar
      f2fs: fix to dirty inode for i_mode recovery · ca597bdd
      Chao Yu authored
      As Seulbae Kim reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202637
      
      We didn't recover permission field correctly after sudden power-cut,
      the reason is in setattr we didn't add inode into global dirty list
      once i_mode is changed, so latter checkpoint triggered by fsync will
      not flush last i_mode into disk, result in this problem, fix it.
      Reported-by: default avatarSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ca597bdd
    • Jaegeuk Kim's avatar
      f2fs: give random value to i_generation · 428e3bcf
      Jaegeuk Kim authored
      This follows to give random number to i_generation along with commit
      23253068 ("ext4: improve smp scalability for inode generation")
      
      This can be used for DUN for UFS HW encryption.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      428e3bcf
    • Gao Xiang's avatar
      f2fs: no need to take page lock in readdir · 613f3dcd
      Gao Xiang authored
      VFS will take inode_lock for readdir, therefore no need to
      take page lock in readdir at all just as the majority of
      other generic filesystems.
      
      This patch improves concurrency since .iterate_shared
      was introduced to VFS years ago.
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      613f3dcd
    • Chao Yu's avatar
      f2fs: fix to update iostat correctly in IPU path · e46f6bd8
      Chao Yu authored
      In error path of IPU, we didn't account iostat correctly, fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e46f6bd8
    • Chao Yu's avatar
      f2fs: fix encrypted page memory leak · 6492a335
      Chao Yu authored
      For IPU path of f2fs_do_write_data_page(), in its error path, we
      need to release encrypted page and fscrypt context, otherwise it
      will cause memory leak.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6492a335
    • Chao Yu's avatar
      f2fs: make fault injection covering __submit_flush_wait() · dc37910d
      Chao Yu authored
      This patch changes to allow failure of f2fs_bio_alloc() in
      __submit_flush_wait(), which can simulate flush error in checkpoint()
      for covering more error paths.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dc37910d
    • Chao Yu's avatar
      f2fs: fix to retry fill_super only if recovery failed · aa2c8c43
      Chao Yu authored
      With current retry mechanism in f2fs_fill_super, first fill_super
      fails due to no memory, then second fill_super runs w/o recovery,
      if we succeed, we may lose fsynced data, it doesn't make sense.
      
      Let's retry fill_super only if it occurs non-ENOMEM error during
      recovery.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aa2c8c43
    • Gao Xiang's avatar
      f2fs: silence VM_WARN_ON_ONCE in mempool_alloc · bc73a4b2
      Gao Xiang authored
      Note that __GFP_ZERO is not supported for mempool_alloc,
      which also documented in the mempool_alloc comments.
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bc73a4b2
    • Zeng Guangyue's avatar
      f2fs: correct spelling mistake · 68b79cdc
      Zeng Guangyue authored
      correct spelling mistake for "nunmber"
      Signed-off-by: default avatarZeng Guangyue <zengguangyue@hisilicon.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      68b79cdc
    • Jaegeuk Kim's avatar
      f2fs: fix wrong #endif · 0af725fc
      Jaegeuk Kim authored
      We have to cover whole headerfile with last #endif.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0af725fc
  3. 06 Mar, 2019 8 commits
    • Jaegeuk Kim's avatar
      f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG · fb40d618
      Jaegeuk Kim authored
      If we met this once, let fsck.f2fs clear this only.
      Note that, this addresses all the subtle fault injection test.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fb40d618
    • Chao Yu's avatar
      f2fs: don't allow negative ->write_io_size_bits · 6d52e135
      Chao Yu authored
      As Dan reported:
      
      "We put an upper bound on ->write_io_size_bits but we don't have a lower
      bound."
      
      So let's add lower bound check for ->write_io_size_bits in parse_options().
      
      [We don't allow configuring ->write_io_size_bits to zero, since at least
      we need to fill one dummy page for aligned IO.]
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6d52e135
    • Chao Yu's avatar
      f2fs: fix to check inline_xattr_size boundary correctly · 500e0b28
      Chao Yu authored
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      500e0b28
    • Sahitya Tummala's avatar
      f2fs: do not use mutex lock in atomic context · 9083977d
      Sahitya Tummala authored
      Fix below warning coming because of using mutex lock in atomic context.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
      in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
      Preemption disabled at: __radix_tree_preload+0x28/0x130
      Call trace:
       dump_backtrace+0x0/0x2b4
       show_stack+0x20/0x28
       dump_stack+0xa8/0xe0
       ___might_sleep+0x144/0x194
       __might_sleep+0x58/0x8c
       mutex_lock+0x2c/0x48
       f2fs_trace_pid+0x88/0x14c
       f2fs_set_node_page_dirty+0xd0/0x184
      
      Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
      spin_lock() acquired.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9083977d
    • Chao Yu's avatar
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu authored
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce
    • Chengguang Xu's avatar
      f2fs: jump to label 'free_node_inode' when failing from d_make_root() · 025cdb16
      Chengguang Xu authored
      When sb->s_root is NULL dput() will do nothing,
      so jump to label 'free_node_inode' instead of lable
      'free_root_inode' when failing from d_make_root().
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      025cdb16
    • Chao Yu's avatar
      f2fs: fix to document inline_xattr_size option · 7321dd97
      Chao Yu authored
      We missed to add document for inline_xattr_size mount option in f2fs.txt,
      add it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7321dd97
    • zhengliang's avatar
      f2fs: fix to data block override node segment by mistake · a0770e13
      zhengliang authored
      v4: Rearrange the previous three versions.
      
      The following scenario could lead to data block override by mistake.
      
      TASK A            |  TASK kworker                                            |     TASK B                                            |       TASK C
                        |                                                          |                                                       |
      open              |                                                          |                                                       |
      write             |                                                          |                                                       |
      close             |                                                          |                                                       |
                        |  f2fs_write_data_pages                                   |                                                       |
                        |    f2fs_write_cache_pages                                |                                                       |
                        |      f2fs_outplace_write_data                            |                                                       |
                        |        f2fs_allocate_data_block (get block in seg S,     |                                                       |
                        |                                  S is full, and only     |                                                       |
                        |                                  have this valid data    |                                                       |
                        |                                  block)                  |                                                       |
                        |          allocate_segment                                |                                                       |
                        |          locate_dirty_segment (mark S as PRE)            |                                                       |
                        |        f2fs_submit_page_write (submit but is not         |                                                       |
                        |                                written on dev)           |                                                       |
      unlink            |                                                          |                                                       |
       iput_final       |                                                          |                                                       |
        f2fs_drop_inode |                                                          |                                                       |
          f2fs_truncate |                                                          |                                                       |
       (not evict)      |                                                          |                                                       |
                        |                                                          | write_checkpoint                                      |
                        |                                                          |  flush merged bio but not wait file data writeback    |
                        |                                                          |  set_prefree_as_free (mark S as FREE)                 |
                        |                                                          |                                                       | update NODE/DATA
                        |                                                          |                                                       | allocate_segment (select S)
                        |     writeback done                                       |                                                       |
      
      So we need to guarantee io complete before truncate inode in f2fs_drop_inode.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarZheng Liang <zhengliang6@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a0770e13
  4. 16 Feb, 2019 8 commits
  5. 04 Feb, 2019 1 commit
  6. 22 Jan, 2019 1 commit