1. 16 Oct, 2018 5 commits
    • Chao Yu's avatar
      f2fs: shrink sbi->sb_lock coverage in set_file_temperature() · ed15ba14
      Chao Yu authored
      file_set_{cold,hot} doesn't need holding sbi->sb_lock, so moving them
      out of the lock.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ed15ba14
    • Chao Yu's avatar
      f2fs: use rb_*_cached friends · 4dada3fd
      Chao Yu authored
      As rbtree supports caching leftmost node natively, update f2fs codes
      to use rb_*_cached helpers to speed up leftmost node visiting.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4dada3fd
    • Chao Yu's avatar
      f2fs: fix to recover cold bit of inode block during POR · ef2a0071
      Chao Yu authored
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +A /mnt/f2fs/file
      6. xfs_io -f /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. chattr -A /mnt/f2fs/file
      11. xfs_io -f /mnt/f2fs/file -c "fsync"
      12. umount /mnt/f2fs
      13. mount -t f2fs /dev/sdd /mnt/f2fs
      14. lsattr /mnt/f2fs/file
      
      -----------------N- /mnt/f2fs/file
      
      But actually, we expect the corrct result is:
      
      -------A---------N- /mnt/f2fs/file
      
      The reason is in step 9) we missed to recover cold bit flag in inode
      block, so later, in fsync, we will skip write inode block due to below
      condition check, result in lossing data in another SPOR.
      
      f2fs_fsync_node_pages()
      	if (!IS_DNODE(page) || !is_cold_node(page))
      		continue;
      
      Note that, I guess that some non-dir inode has already lost cold bit
      during POR, so in order to reenable recovery for those inode, let's
      try to recover cold bit in f2fs_iget() to save more fsynced data.
      
      Fixes: c5667575 ("f2fs: remove unneeded set_cold_node()")
      Cc: <stable@vger.kernel.org> 4.17+
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ef2a0071
    • Chao Yu's avatar
      f2fs: submit cached bio to avoid endless PageWriteback · 48018b4c
      Chao Yu authored
      When migrating encrypted block from background GC thread, we only add
      them into f2fs inner bio cache, but forget to submit the cached bio, it
      may cause potential deadlock when we are waiting page writebacked, fix
      it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      48018b4c
    • Daniel Rosenberg's avatar
      f2fs: checkpoint disabling · 4354994f
      Daniel Rosenberg authored
      Note that, it requires "f2fs: return correct errno in f2fs_gc".
      
      This adds a lightweight non-persistent snapshotting scheme to f2fs.
      
      To use, mount with the option checkpoint=disable, and to return to
      normal operation, remount with checkpoint=enable. If the filesystem
      is shut down before remounting with checkpoint=enable, it will revert
      back to its apparent state when it was first mounted with
      checkpoint=disable. This is useful for situations where you wish to be
      able to roll back the state of the disk in case of some critical
      failure.
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4354994f
  2. 03 Oct, 2018 1 commit
    • Jaegeuk Kim's avatar
      f2fs: clear PageError on the read path · fb7d70db
      Jaegeuk Kim authored
      When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
      gc_data_segment():
      
      0. fault injection generated some PageError'ed pages
      
      1. gc_data_segment
       -> f2fs_get_read_data_page(REQ_RAHEAD)
      
      2. move_data_page
       -> f2fs_get_lock_data_page()
        -> f2f_get_read_data_page()
         -> f2fs_submit_page_read()
          -> submit_bio(READ)
        -> return EIO due to PageError
        -> fail to move data
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fb7d70db
  3. 01 Oct, 2018 6 commits
  4. 30 Sep, 2018 1 commit
  5. 29 Sep, 2018 1 commit
  6. 28 Sep, 2018 2 commits
  7. 26 Sep, 2018 8 commits
  8. 20 Sep, 2018 2 commits
  9. 19 Sep, 2018 1 commit
  10. 12 Sep, 2018 9 commits
    • Chao Yu's avatar
      f2fs: split IO error injection according to RW · 6f5c2ed0
      Chao Yu authored
      This patch adds to support injecting error for write IO, this can simulate
      IO error like fail_make_request or dm_flakey does.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6f5c2ed0
    • Chao Yu's avatar
      f2fs: add SPDX license identifiers · 7c1a000d
      Chao Yu authored
      Remove the verbose license text from f2fs files and replace them with
      SPDX tags.  This does not change the license of any of the code.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7c1a000d
    • Chengguang Xu's avatar
      f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION · 4cb037ec
      Chengguang Xu authored
      It's a little bit strange when fault_injection related
      options fail with -EINVAL which were already disabled
      from config, so surround all fault_injection related option
      parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
      slightly change warning message to keep consistency with
      option POSIX_ACL and FS_XATTR.
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4cb037ec
    • Wang Shilong's avatar
      f2fs: fix setattr project check upon fssetxattr ioctl · c8e92757
      Wang Shilong authored
      Currently, project quota could be changed by fssetxattr
      ioctl, and existed permission check inode_owner_or_capable()
      is obviously not enough, just think that common users could
      change project id of file, that could make users to
      break project quota easily.
      
      This patch try to follow same regular of xfs project
      quota:
      
      "Project Quota ID state is only allowed to change from
      within the init namespace. Enforce that restriction only
      if we are trying to change the quota ID state.
      Everything else is allowed in user namespaces."
      
      Besides that, check and set project id'state should
      be an atomic operation, protect whole operation with
      inode lock.
      Signed-off-by: default avatarWang Shilong <wshilong@ddn.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c8e92757
    • Zhikang Zhang's avatar
      f2fs: avoid sleeping under spin_lock · b430f726
      Zhikang Zhang authored
      In the call trace below, we might sleep in function dput().
      
      So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
      from __try_update_largest_extent && __drop_largest_extent.
      
      BUG: sleeping function called from invalid context at fs/dcache.c:796
      Call trace:
      	dump_backtrace+0x0/0x3f4
      	show_stack+0x24/0x30
      	dump_stack+0xe0/0x138
      	___might_sleep+0x2a8/0x2c8
      	__might_sleep+0x78/0x10c
      	dput+0x7c/0x750
      	block_dump___mark_inode_dirty+0x120/0x17c
      	__mark_inode_dirty+0x344/0x11f0
      	f2fs_mark_inode_dirty_sync+0x40/0x50
      	__insert_extent_tree+0x2e0/0x2f4
      	f2fs_update_extent_tree_range+0xcf4/0xde8
      	f2fs_update_extent_cache+0x114/0x12c
      	f2fs_update_data_blkaddr+0x40/0x50
      	write_data_page+0x150/0x314
      	do_write_data_page+0x648/0x2318
      	__write_data_page+0xdb4/0x1640
      	f2fs_write_cache_pages+0x768/0xafc
      	__f2fs_write_data_pages+0x590/0x1218
      	f2fs_write_data_pages+0x64/0x74
      	do_writepages+0x74/0xe4
      	__writeback_single_inode+0xdc/0x15f0
      	writeback_sb_inodes+0x574/0xc98
      	__writeback_inodes_wb+0x190/0x204
      	wb_writeback+0x730/0xf14
      	wb_check_old_data_flush+0x1bc/0x1c8
      	wb_workfn+0x554/0xf74
      	process_one_work+0x440/0x118c
      	worker_thread+0xac/0x974
      	kthread+0x1a0/0x1c8
      	ret_from_fork+0x10/0x1c
      Signed-off-by: default avatarZhikang Zhang <zhangzhikang1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b430f726
    • Chao Yu's avatar
      f2fs: plug readahead IO in readdir() · e1293bdf
      Chao Yu authored
      Add a plug to merge readahead IO in readdir(), expecting it can
      reduce bio count before submitting to block layer.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e1293bdf
    • Chao Yu's avatar
      f2fs: fix to do sanity check with current segment number · 042be0f8
      Chao Yu authored
      https://bugzilla.kernel.org/show_bug.cgi?id=200219
      
      Reproduction way:
      - mount image
      - run poc code
      - umount image
      
      F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 2 PID: 17686 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: update_sit_entry+0x459/0x4e0 [f2fs]
      Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
      EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
      ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
      CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
      Call Trace:
       f2fs_allocate_data_block+0x124/0x580 [f2fs]
       do_write_page+0x78/0x150 [f2fs]
       f2fs_do_write_node_page+0x25/0xa0 [f2fs]
       __write_node_page+0x2bf/0x550 [f2fs]
       f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
       ? sync_inode_metadata+0x2f/0x40
       ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
       ? up_write+0x1e/0x80
       f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
       ? mark_held_locks+0x5d/0x80
       ? _raw_spin_unlock_irq+0x27/0x50
       kill_f2fs_super+0x68/0x90 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: 0xb7f95c51
      Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
      EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
      ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
      DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
      Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
      ---[ end trace d423f83982cfcdc5 ]---
      
      The reason is, different log headers using the same segment, once
      one log's next block address is used by another log, it will cause
      panic as above.
      
      Main area: 24 segs, 24 secs 24 zones
        - COLD  data: 0, 0, 0
        - WARM  data: 1, 1, 1
        - HOT   data: 20, 20, 20
        - Dir   dnode: 22, 22, 22
        - File   dnode: 22, 22, 22
        - Indir nodes: 21, 21, 21
      
      So this patch adds sanity check to detect such condition to avoid
      this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      042be0f8
    • Chao Yu's avatar
      f2fs: fix memory leak of percpu counter in fill_super() · 4a70e255
      Chao Yu authored
      In fill_super -> init_percpu_info, we should destroy percpu counter
      in error path, otherwise memory allcoated for percpu counter will
      leak.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4a70e255
    • Chao Yu's avatar
      f2fs: fix memory leak of write_io in fill_super() · 0b2103e8
      Chao Yu authored
      It needs to release memory allocated for sbi->write_io in error path,
      otherwise, it will cause memory leak.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0b2103e8
  11. 11 Sep, 2018 3 commits
    • Chengguang Xu's avatar
      f2fs: cache NULL when both default_acl and acl are NULL · 313ed62a
      Chengguang Xu authored
      default_acl and acl of newly created inode will be initiated
      as ACL_NOT_CACHED in vfs function inode_init_always() and later
      will be updated by calling xxx_init_acl() in specific filesystems.
      Howerver, when default_acl and acl are NULL then they keep the value
      of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
      in this case.
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Acked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      313ed62a
    • Chao Yu's avatar
      f2fs: fix to flush all dirty inodes recovered in readonly fs · 1378752b
      Chao Yu authored
      generic/417 reported as blow:
      
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      Call Trace:
       ? _raw_spin_unlock+0x2c/0x50
       evict+0xa8/0x170
       dispose_list+0x34/0x40
       evict_inodes+0x118/0x120
       generic_shutdown_super+0x41/0x100
       ? rcu_read_lock_sched_held+0x97/0xa0
       kill_block_super+0x22/0x50
       kill_f2fs_super+0x6f/0x80 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      
      It can simply reproduced with scripts:
      
      Enable quota feature during mkfs.
      
      Testcase1:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
      4. godown /mnt/f2fs
      5. umount /mnt/f2fs
      6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      7. umount /mnt/f2fs
      
      Testcase2:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. create process[pid = x] do:
      	a) open /mnt/f2fs/file;
      	b) unlink /mnt/f2fs/file
      5. godown -f /mnt/f2fs
      6. kill process[pid = x]
      7. umount /mnt/f2fs
      8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      9. umount /mnt/f2fs
      
      The reason is: during recovery, i_{c,m}time of inode will be updated, then
      the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
      global list, so later write_checkpoint will not flush such dirty inode into
      node page.
      
      Once umount is called, sync_filesystem() in generic_shutdown_super() will
      skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
      there.
      
      To solve this issue, during umount, add remove SB_RDONLY flag in
      sb->s_flags, to make sure sync_filesystem() will not be skipped.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1378752b
    • Yunlei He's avatar
      f2fs: report error if quota off error during umount · cda9cc59
      Yunlei He authored
      Now, we depend on fsck to ensure quota file data is ok,
      so we scan whole partition if checkpoint without umount
      flag. It's same for quota off error case, which may make
      quota file data inconsistent.
      
      generic/019 reports below error:
      
       __quota_error: 1160 callbacks suppressed
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have a nice day...
      
      If we failed in below path due to fail to write dquot block, we will miss
      to release quota inode, fix it.
      
      - f2fs_put_super
       - f2fs_quota_off_umount
        - f2fs_quota_off
         - f2fs_quota_sync   <-- failed
         - dquot_quota_off   <-- missed to call
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cda9cc59
  12. 07 Sep, 2018 1 commit