1. 04 Jun, 2018 4 commits
    • Chao Yu's avatar
      f2fs: fix to clear FI_VOLATILE_FILE correctly · dfa74280
      Chao Yu authored
      Thread A			Thread B
      - f2fs_release_file
       - clear_inode_flag(FI_VOLATILE_FILE)
      				- wb_writeback
      				 - writeback_sb_inodes
      				  - __writeback_single_inode
      				   - do_writepages
      				    - f2fs_write_data_pages
      				     - __write_data_page
      				     all volatile file's pages
      				     are writebacked to storage
       - set_inode_flag(FI_DROP_CACHE)
       - filemap_fdatawrite
      
      There is a hole that mm can flush all dirty pages of volatile file as
      inode is not tagged with both FI_VOLATILE_FILE and FI_DROP_CACHE flags,
      we should never writeback the page #0 and also it's unneeded to writeback
      other pages.
      
      This patch adjusts to relocate clear_inode_flag(FI_VOLATILE_FILE), so that
      FI_VOLATILE_FILE flag can be remained before all dirty pages were dropped
      to avoid issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dfa74280
    • Chao Yu's avatar
      f2fs: let sync node IO interrupt async one · c29fd0c0
      Chao Yu authored
      Although mixed sync/async IOs can have continuous LBA, as they have
      different IO priority, block IO scheduler will add them into different
      queues and commit them separately, result in splited IOs which causes
      wrose performance.
      
      This patch gives high priority to synchronous IO of nodes, means that
      once synchronous flow starts, it can interrupt asynchronous writeback
      flow of system flusher, so more big IOs can be expected.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c29fd0c0
    • Chao Yu's avatar
      f2fs: don't change wbc->sync_mode · aae764ec
      Chao Yu authored
      We should never falsify wbc->sync_mode passed from mm, otherwise
      mm can trigger writeback with wrong IO priority.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aae764ec
    • Chao Yu's avatar
      f2fs: fix to update mtime correctly · a1f72ac2
      Chao Yu authored
      If we change system time to the past, get_mtime() will return a
      overflowed time, and SIT_I(sbi)->max_mtime will be udpated
      incorrectly, this patch fixes the two issues.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a1f72ac2
  2. 31 May, 2018 36 commits
    • youngjun yoo's avatar
      fs: f2fs: insert space around that ':' and ', ' · 1061fd48
      youngjun yoo authored
      clean up checkpatch error:
      ERROR: space required after that ':'
      ERROR: space required after that ','
      Signed-off-by: default avataryoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1061fd48
    • youngjun yoo's avatar
      fs: f2fs: add missing blank lines after declarations · f11e98bd
      youngjun yoo authored
      clean up checkpatch warning:
      WARNING: Missing a blank line after declarations
      Signed-off-by: default avataryoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f11e98bd
    • youngjun yoo's avatar
      fs: f2fs: changed variable type of offset "unsigned" to "loff_t" · 193bea1d
      youngjun yoo authored
      clean up checkpatch warning:
      WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
      Signed-off-by: default avataryoungjun yoo <youngjun.willow@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      193bea1d
    • Chao Yu's avatar
      f2fs: clean up symbol namespace · 4d57b86d
      Chao Yu authored
      As Ted reported:
      
      "Hi, I was looking at f2fs's sources recently, and I noticed that there
      is a very large number of non-static symbols which don't have a f2fs
      prefix.  There's well over a hundred (see attached below).
      
      As one example, in fs/f2fs/dir.c there is:
      
      unsigned char get_de_type(struct f2fs_dir_entry *de)
      
      This function is clearly only useful for f2fs, but it has a generic
      name.  This means that if any other file system tries to have the same
      symbol name, there will be a symbol conflict and the kernel would not
      successfully build.  It also means that when someone is looking f2fs
      sources, it's not at all obvious whether a function such as
      read_data_page(), invalidate_blocks(), is a generic kernel function
      found in the fs, mm, or block layers, or a f2fs specific function.
      
      You might want to fix this at some point.  Hopefully Kent's bcachefs
      isn't similarly using genericly named functions, since that might
      cause conflicts with f2fs's functions --- but just as this would be a
      problem that we would rightly insist that Kent fix, this is something
      that we should have rightly insisted that f2fs should have fixed
      before it was integrated into the mainline kernel.
      
      acquire_orphan_inode
      add_ino_entry
      add_orphan_inode
      allocate_data_block
      allocate_new_segments
      alloc_nid
      alloc_nid_done
      alloc_nid_failed
      available_free_memory
      ...."
      
      This patch adds "f2fs_" prefix for all non-static symbols in order to:
      a) avoid conflict with other kernel generic symbols;
      b) to indicate the function is f2fs specific one instead of generic
      one;
      Reported-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4d57b86d
    • Chao Yu's avatar
      f2fs: make set_de_type() static · 2e79d951
      Chao Yu authored
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2e79d951
    • Chao Yu's avatar
      f2fs: make __f2fs_write_data_pages() static · fc99fe27
      Chao Yu authored
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fc99fe27
    • Chao Yu's avatar
      f2fs: fix to avoid accessing cross the boundary · 9fd62605
      Chao Yu authored
      Configure io_bits with 2 and enable LFS mode, generic/017 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000039
      *pdpt = 000000002fcb2001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi pcbc snd_seq joydev aesni_intel aes_i586 snd_seq_device snd_timer crypto_simd cryptd snd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic usbhid psmouse hid e1000
      CPU: 2 PID: 20779 Comm: xfs_io Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: is_checkpointed_data+0x84/0xd0 [f2fs]
      EFLAGS: 00010207 CPU: 2
      EAX: 00000000 EBX: f5cd7000 ECX: fffffe32 EDX: 00000039
      ESI: 000001cd EDI: ec95fb6c EBP: e264bd80 ESP: e264bd6c
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000039 CR3: 2fe55660 CR4: 000406f0
      Call Trace:
       __exchange_data_block+0xb3f/0x1000 [f2fs]
       f2fs_fallocate+0xab9/0x16b0 [f2fs]
       vfs_fallocate+0x17c/0x2d0
       ksys_fallocate+0x42/0x70
       sys_fallocate+0x31/0x40
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7f98c51
      EFLAGS: 00000293 CPU: 2
      EAX: ffffffda EBX: 00000003 ECX: 00000008 EDX: 01001000
      ESI: 00000000 EDI: 00001000 EBP: 00000000 ESP: bfc0357c
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: 00 00 d3 e8 8b 4d ec 2b 02 8b 55 f0 6b c0 1c 03 41 70 29 d6 8b 93 d0 06 00 00 8b 40 0c 83 ea 01 21 d6 89 f2 89 f1 c1 ea 03 f7 d1 <0f> be 14 10 83 e1 07 b8 01 00 00 00 d3 e0 85 c2 89 f8 0f 95 c3
      EIP: is_checkpointed_data+0x84/0xd0 [f2fs] SS:ESP: 0068:e264bd6c
      CR2: 0000000000000039
      ---[ end trace 9a4d4087cce6080a ]---
      
      This is because in recovery flow of __exchange_data_block, we didn't pass olen to
      __roll_back_blkaddrs, instead we passed len, which indicates wrong array size, result
      in copying random block address into dnode page.
      
      Later, once that random block address was accessed by is_checkpointed_data, it can
      cause NULL pointer dereference.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9fd62605
    • Chao Yu's avatar
      f2fs: fix to let caller retry allocating block address · fe16efe6
      Chao Yu authored
      Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000104
      *pdpt = 0000000029b7b001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP
      Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
      CPU: 0 PID: 11161 Comm: fsstress Tainted: G           O      4.17.0-rc2 #38
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
      EFLAGS: 00010206 CPU: 0
      EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200
      ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0
      Call Trace:
       do_write_page+0x6f/0xc0 [f2fs]
       write_data_page+0x4a/0xd0 [f2fs]
       do_write_data_page+0x327/0x630 [f2fs]
       __write_data_page+0x34b/0x820 [f2fs]
       __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
       f2fs_write_data_pages+0x27/0x30 [f2fs]
       do_writepages+0x1a/0x70
       __filemap_fdatawrite_range+0x94/0xd0
       filemap_write_and_wait_range+0x3d/0xa0
       __generic_file_write_iter+0x11a/0x1f0
       f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
       __vfs_write+0xd2/0x150
       vfs_write+0x9b/0x190
       ksys_write+0x45/0x90
       sys_write+0x16/0x20
       do_fast_syscall_32+0xaa/0x22c
       entry_SYSENTER_32+0x4c/0x7b
      EIP: 0xb7fc8c51
      EFLAGS: 00000246 CPU: 0
      EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000
      ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
      EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
      CR2: 0000000000000104
      ---[ end trace 4cac79c0d1305ee6 ]---
      
      allocate_data_block will submit all sequential pending IOs sorted by a
      FIFO list, If we failed to submit other user's IO due to unaligned write,
      we will retry to allocate new block address for current IO, then it will
      initialize fio.list again, if fio was in the list before, it can break
      FIFO list, result in above panic.
      
      Thread A			Thread B
      - do_write_page
       - allocate_data_block
        - list_add_tail
        : fioA cached in FIFO list.
      				- do_write_page
      				 - allocate_data_block
      				  - list_add_tail
      				  : fioB cached in FIFO list.
      				 - f2fs_submit_page_write
      				 : fail to submit IO
      				 - allocate_data_block
      				  - INIT_LIST_HEAD
       - f2fs_submit_page_write
        - list_del  <-- NULL pointer dereference
      
      This patch adds fio.retry parameter to indicate failure status for each
      IO, and avoid bailing out if there is still pending IO in FIFO list for
      fixing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fe16efe6
    • Anatoly Pugachev's avatar
      disable loading f2fs module on PAGE_SIZE > 4KB · 4071e67c
      Anatoly Pugachev authored
      The following patch disables loading of f2fs module on architectures
      which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
      such architectures , log messages are:
      
      mount: /mnt: wrong fs type, bad option, bad superblock on
      /dev/vdiskb1, missing codepage or helper program, or other error.
      /dev/vdiskb1: F2FS filesystem,
      UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""
      
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
      filesystem in 1th superblock
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
      filesystem in 2th superblock
      May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
      page_cache_size (8192), supports only 4KB
      
      which was introduced by git commit 5c9b4692
      
      tested on git kernel 4.17.0-rc6-00309-gec30dcf7
      
      with patch applied:
      
      modprobe: ERROR: could not insert 'f2fs': Invalid argument
      May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096
      Signed-off-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4071e67c
    • Chao Yu's avatar
      f2fs: fix error path of move_data_page · 14a28559
      Chao Yu authored
      This patch fixes error path of move_data_page:
      - clear cold data flag if it fails to write page.
      - redirty page for non-ENOMEM case.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      14a28559
    • Chao Yu's avatar
      f2fs: don't drop dentry pages after fs shutdown · 1174abfd
      Chao Yu authored
      As description in commit "f2fs: don't drop any page on f2fs_cp_error()
      case":
      
      "We still provide readdir() after shtudown, so we should keep pages to
      avoid additional IOs."
      
      In order to provider lastest directory structure, let's keep dentry
      pages in cache after fs shutdown.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1174abfd
    • Chao Yu's avatar
      f2fs: fix to avoid race during access gc_thread pointer · 250dbf51
      Chao Yu authored
      Thread A			Thread B
      - f2fs_remount
       - stop_gc_thread
      				- f2fs_sbi_store
         sbi->gc_thread = NULL;
      				  access sbi->gc_thread->gc_*
      
      Previously, we allocate memory for sbi->gc_thread based on background
      gc thread mount option, the memory can be released if we turn off
      that mount option, but still there are several places access gc_thread
      pointer without considering race condition, result in NULL point
      dereference.
      
      In order to fix this issue, use sb->s_umount to exclude those operations.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      250dbf51
    • Chao Yu's avatar
      f2fs: clean up with clear_radix_tree_dirty_tag · aec2f729
      Chao Yu authored
      Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aec2f729
    • Chao Yu's avatar
      f2fs: fix to don't trigger writeback during recovery · 64c74a7a
      Chao Yu authored
      - f2fs_fill_super
       - recover_fsync_data
        - recover_data
         - del_fsync_inode
          - iput
           - iput_final
            - write_inode_now
             - f2fs_write_inode
              - f2fs_balance_fs
               - f2fs_balance_fs_bg
                - sync_dirty_inodes
      
      With data_flush mount option, during recovery, in order to avoid entering
      above writeback flow, let's detect recovery status and do skip in
      f2fs_balance_fs_bg.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      64c74a7a
    • Sheng Yong's avatar
      f2fs: clear discard_wake earlier · 35a9a766
      Sheng Yong authored
      If SBI_NEED_FSCK is set, discard_wake will never be cleared. As a
      result, the condition of wait_event_interruptible_timeout() is always
      true, which gets discard thread run too frequently.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      35a9a766
    • Yunlei He's avatar
      f2fs: let discard thread wait a little longer if dev is busy · f9d1dced
      Yunlei He authored
      This patch modify discard thread wait policy as below:
      	issued       io_interrupted     wait time(ms)
      1.        8                 0               50
      2.      (0,8)               1               50
      3.        0                 1              500 (dev is busy)
      4.        0                 0            60000 (no candidates)
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f9d1dced
    • Chao Yu's avatar
      f2fs: avoid stucking GC due to atomic write · 2ef79ecb
      Chao Yu authored
      f2fs doesn't allow abuse on atomic write class interface, so except
      limiting in-mem pages' total memory usage capacity, we need to limit
      atomic-write usage as well when filesystem is seriously fragmented,
      otherwise we may run into infinite loop during foreground GC because
      target blocks in victim segment are belong to atomic opened file for
      long time.
      
      Now, we will detect failure due to atomic write in foreground GC, if
      the count exceeds threshold, we will drop all atomic written data in
      cache, by this, I expect it can keep our system running safely to
      prevent Dos attack.
      
      In addition, his patch adds to show GC skip information in debugfs,
      now it just shows count of skipped caused by atomic write.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2ef79ecb
    • Jaegeuk Kim's avatar
      f2fs: introduce sbi->gc_mode to determine the policy · 5b0e9539
      Jaegeuk Kim authored
      This is to avoid sbi->gc_thread pointer access.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5b0e9539
    • Chao Yu's avatar
      f2fs: keep migration IO order in LFS mode · 107a805d
      Chao Yu authored
      For non-migration IO, we will keep order of data/node blocks' submitting
      as allocation sequence by sorting IOs in per log io_list list, but for
      migration IO, it could be out-of-order.
      
      In LFS mode, we should keep all IOs including migration IO be ordered,
      so that this patch fixes to add an additional lock to keep submitting
      order.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      107a805d
    • Chao Yu's avatar
      f2fs: fix to wait page writeback during revoking atomic write · e5e5732d
      Chao Yu authored
      After revoking atomic write, related LBA can be reused by others, so we
      need to wait page writeback before reusing the LBA, in order to avoid
      interference between old atomic written in-flight IO and new IO.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e5e5732d
    • Sahitya Tummala's avatar
      f2fs: Fix deadlock in shutdown ioctl · 60b2b4ee
      Sahitya Tummala authored
      f2fs_ioc_shutdown() ioctl gets stuck in the below path
      when issued with F2FS_GOING_DOWN_FULLSYNC option.
      
      __switch_to+0x90/0xc4
      percpu_down_write+0x8c/0xc0
      freeze_super+0xec/0x1e4
      freeze_bdev+0xc4/0xcc
      f2fs_ioctl+0xc0c/0x1ce0
      f2fs_compat_ioctl+0x98/0x1f0
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      60b2b4ee
    • Chao Yu's avatar
      f2fs: detect synchronous writeback more earlier · f8de4331
      Chao Yu authored
      This patch changes to detect synchronous writeback more earlier before,
      in order to avoid unnecessary page writeback before exiting asynchronous
      writeback.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f8de4331
    • Chao Yu's avatar
      f2fs: clean up with is_valid_blkaddr() · 7b525dd0
      Chao Yu authored
      - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
      - introduce is_valid_blkaddr() for cleanup.
      
      No logic change in this patch.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7b525dd0
    • Chao Yu's avatar
      f2fs: fix to initialize min_mtime with ULLONG_MAX · 5ad25442
      Chao Yu authored
      Since sit_i.min_mtime's type is unsigned long long, so we should
      initialize it with max value of the type ULLONG_MAX instead of
      LLONG_MAX.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5ad25442
    • Chao Yu's avatar
      f2fs: fix to let checkpoint guarantee atomic page persistence · e7a4feb0
      Chao Yu authored
      1. thread A: commit_inmem_pages submit data into block layer, but
      haven't waited it writeback.
      2. thread A: commit_inmem_pages update related node.
      3. thread B: do checkpoint, flush all nodes to disk.
      4. SPOR
      
      Then, atomic file becomes corrupted since nodes is flushed before data.
      
      This patch fixes to treat atomic page as checkpoint guaranteed one,
      then in checkpoint, we can make sure all atomic page can be writebacked
      with metadata of atomic file.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e7a4feb0
    • Chao Yu's avatar
      f2fs: fix to initialize i_current_depth according to inode type · 1c41e680
      Chao Yu authored
      i_current_depth is used only for directory inode, but its space is
      shared with i_gc_failures field used for regular inode, in order to
      avoid affecting i_gc_failures' value, this patch fixes to initialize
      the union's fields according to inode type.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1c41e680
    • Chao Yu's avatar
      Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc" · 299254d8
      Chao Yu authored
      For extreme case:
      10 section, op = 10%, no_fggc_threshold = 90%
      All section usage: 85% 85% 85% 85% 90% 90% 95% 95% 95% 95%
      
      During foreground GC, if we skip select dirty section whose usage
      is larger than no_fggc_threshold, we can only recycle 80% invalid
      space from four 85% usage sections and two 90% usage sections,
      result in encountering out-of-space issue.
      
      This reverts commit e93b9865 to
      fix this issue, besides, we keep the logic that we scan all dirty
      section when searching a victim, so that GC can select victim with
      least valid blocks.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      299254d8
    • Jaegeuk Kim's avatar
      f2fs: don't drop any page on f2fs_cp_error() case · 868de613
      Jaegeuk Kim authored
      We still provide readdir() after shtudown, so we should keep pages to avoid
      additional IOs.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      868de613
    • Colin Ian King's avatar
      f2fs: fix spelling mistake: "extenstion" -> "extension" · 4580038e
      Colin Ian King authored
      Trivial fix to spelling mistake in extension list text
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4580038e
    • Jaegeuk Kim's avatar
      f2fs: enhance sanity_check_raw_super() to avoid potential overflows · 0cfe75c5
      Jaegeuk Kim authored
      In order to avoid the below overflow issue, we should have checked the
      boundaries in superblock before reaching out to allocation. As Linus suggested,
      the right place should be sanity_check_raw_super().
      
      Dr Silvio Cesare of InfoSect reported:
      
      There are integer overflows with using the cp_payload superblock field in the
      f2fs filesystem potentially leading to memory corruption.
      
      include/linux/f2fs_fs.h
      
      struct f2fs_super_block {
      ...
              __le32 cp_payload;
      
      fs/f2fs/f2fs.h
      
      typedef u32 block_t;    /*
                               * should not change u32, since it is the on-disk block
                               * address format, __le32.
                               */
      ...
      
      static inline block_t __cp_payload(struct f2fs_sb_info *sbi)
      {
              return le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload);
      }
      
      fs/f2fs/checkpoint.c
      
              block_t start_blk, orphan_blocks, i, j;
      ...
              start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
              orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);
      
      +++ integer overflows
      
      ...
              unsigned int cp_blks = 1 + __cp_payload(sbi);
      ...
              sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL);
      
      +++ integer overflow leading to incorrect heap allocation.
      
              int cp_payload_blks = __cp_payload(sbi);
      ...
              ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks +
                              orphan_blocks);
      
      +++ sign bug and integer overflow
      
      ...
              for (i = 1; i < 1 + cp_payload_blks; i++)
      
      +++ integer overflow
      
      ...
      
            sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
                              NR_CURSEG_TYPE - __cp_payload(sbi)) *
                                      F2FS_ORPHANS_PER_BLOCK;
      
      +++ integer overflow
      Reported-by: default avatarGreg KH <greg@kroah.com>
      Reported-by: default avatarSilvio Cesare <silvio.cesare@gmail.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0cfe75c5
    • Chao Yu's avatar
      f2fs: treat volatile file's data as hot one · b4c3ca8b
      Chao Yu authored
      Volatile file's data will be updated oftenly, so it'd better to place
      its data into hot data segment.
      
      In addition, for atomic file, we change to check FI_ATOMIC_FILE instead
      of FI_HOT_DATA to make code readability better.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b4c3ca8b
    • Chao Yu's avatar
      f2fs: introduce release_discard_addr() for cleanup · af8ff65b
      Chao Yu authored
      Introduce release_discard_addr() to include common codes for cleanup.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Fengguang Wu: declare static function, reported by kbuild test robot]
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      af8ff65b
    • Chao Yu's avatar
      f2fs: fix potential overflow · a9af3fdc
      Chao Yu authored
      In build_sit_entries(), if valid_blocks in SIT block is smaller than
      valid_blocks in journal, for below calculation:
      
      sbi->discard_blks += old_valid_blocks - se->valid_blocks;
      
      There will be two times potential overflow:
      - old_valid_blocks - se->valid_blocks will overflow, and be a very
      large number.
      - sbi->discard_blks += result will overflow again, comes out a correct
      result accidently.
      
      Anyway, it should be fixed.
      
      Fixes: d600af23 ("f2fs: avoid unneeded loop in build_sit_entries")
      Fixes: 1f43e2ad ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a9af3fdc
    • Chao Yu's avatar
      f2fs: rename dio_rwsem to i_gc_rwsem · b2532c69
      Chao Yu authored
      RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
      race between dio and data gc, but now, it is more wildly used to avoid
      foreground operation vs data gc. So rename it to i_gc_rwsem to improve
      its readability.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b2532c69
    • Yunlei He's avatar
      f2fs: move mnt_want_write_file after range check · b82f6e34
      Yunlei He authored
      This patch move mnt_want_write_file after range check,
      it's needless to check arguments with it.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b82f6e34
    • Yunlei He's avatar
      f2fs: fix missing clear FI_NO_PREALLOC in some error case · cba41be0
      Yunlei He authored
      This patch fix missing clear FI_NO_PREALLOC in some error case
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cba41be0