1. 26 Aug, 2015 1 commit
    • Chao Yu's avatar
      f2fs: update extent tree in batches · 19b2c30d
      Chao Yu authored
      This patch introduce a new helper f2fs_update_extent_tree_range which can
      do extent mapping update at a specified range.
      
      The main idea is:
      1) punch all mapping info in extent node(s) which are at a specified range;
      2) try to merge new extent mapping with adjacent node, or failing that,
         insert the mapping into extent tree as a new node.
      
      In order to see the benefit, I add a function for stating time stamping
      count as below:
      
      uint64_t rdtsc(void)
      {
      	uint32_t lo, hi;
      	__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      	return (uint64_t)hi << 32 | lo;
      }
      
      My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.
      
      truncation path:	update extent cache from truncate_data_blocks_range
      non-truncataion path:	update extent cache from other paths
      total:			all update paths
      
      a) Removing 128MB file which has one extent node mapping whole range of
      file:
      1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
      2. sync
      3. rm /mnt/f2fs/128M
      
      Before:
      		total		count		average
      truncation:	7651022		32768		233.49
      
      Patched:
      		total		count		average
      truncation:	3321		33		100.64
      
      b) fsstress:
      fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
      Test times:		5 times.
      
      Before:
      		total		count		average
      truncation:	5812480.6	20911.6		277.95
      non-truncation:	7783845.6	13440.8		579.12
      total:		13596326.2	34352.4		395.79
      
      Patched:
      		total		count		average
      truncation:	1281283.0	3041.6		421.25
      non-truncation:	7355844.4	13662.8		538.38
      total:		8637127.4	16704.4		517.06
      
      1) For the updates in truncation path:
       - we can see updating in batches leads total tsc and update count reducing
         explicitly;
       - besides, for a single batched updating, punching multiple extent nodes
         in a loop, result in executing more operations, so our average tsc
         increase intensively.
      2) For the updates in non-truncation path:
       - there is a little improvement, that is because for the scenario that we
         just need to update in the head or tail of extent node, new interface
         optimize to update info in extent node directly, rather than removing
         original extent node for updating and then inserting that updated one
         into cache as new node.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      19b2c30d
  2. 24 Aug, 2015 6 commits
  3. 22 Aug, 2015 9 commits
    • Chao Yu's avatar
      f2fs: lookup neighbor extent nodes for merging later · dac2ddef
      Chao Yu authored
      In __lookup_extent_tree_ret we will not try to find neighbor nodes if
      we find the target node, in this condition, we will lost the chance to
      merge the new mapping with exist extent node later.
      
      So our extent cache of inode will be fragmented after overwrite exist
      file, we can see the number of extent node increases intensively in
      following test case:
      
      dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024
      
      Extent Cache:
        - Hit Count: L1-1:0 L1-2:0 L2:0
        - Hit Ratio: 0% (0 / 3072)
        - Inner Struct Count: tree: 1, node: 1
      
      dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc
      
      Extent Cache:
        - Hit Count: L1-1:2048 L1-2:0 L2:0
        - Hit Ratio: 33% (2048 / 6144)
        - Inner Struct Count: tree: 1, node: 961
      
      This patch fixes to lookup neighbors of target node for further
      merging.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dac2ddef
    • Chao Yu's avatar
      f2fs: split __insert_extent_tree_ret for readability · ef05e221
      Chao Yu authored
      This patch splits __insert_extent_tree_ret into __try_merge_extent_node &
      __insert_extent_tree for code readability.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ef05e221
    • Chao Yu's avatar
      f2fs: kill dead code in __insert_extent_tree · a6f78345
      Chao Yu authored
      After commit 0f825ee6 ("f2fs: add new interfaces for extent tree"),
      f2fs_init_extent_tree becomes the only caller of __insert_extent_tree, and
      in f2fs_init_extent_tree, we will only insert extent node in an empty tree,
      so __try_{back,front}_merge in __insert_extent_tree will never be called.
      
      This patch removes these dead codes, besides, rename __insert_extent_tree
      to __init_extent_tree for readability.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a6f78345
    • Chao Yu's avatar
      f2fs: adjust showing of extent cache stat · 029e13cc
      Chao Yu authored
      This patch alters to replace total hit stat with rbtree hit stat,
      and then adjust showing of extent cache stat:
      
      Hit Count:
      L1-1: for largest node hit count;
      L1-2: for last cached node hit count;
      L2: for extent node hit after lookuping in rbtree.
      
      Hit Ratio:
      ratio (hit count / total lookup count)
      
      Inner Struct Count:
      tree count, node count.
      
      Before:
      Extent Hit Ratio: 0 / 2
      
      Extent Tree Count: 3
      
      Extent Node Count: 2
      
      Patched:
      Exten Cacache:
        - Hit Count: L1-1:4871 L1-2:2074 L2:208
        - Hit Ratio: 1% (7153 / 550751)
        - Inner Struct Count: tree: 26560, node: 11824
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      029e13cc
    • Chao Yu's avatar
      f2fs: add largest/cached stat in extent cache · 91c481ff
      Chao Yu authored
      This patch adds to stat the hit count of largest/cached node for showing
      in debugfs.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      91c481ff
    • Chao Yu's avatar
      f2fs: fix incorrect mapping for bmap · e2b4e2bc
      Chao Yu authored
      The test step is like below:
      1. touch file
      2. truncate -s $((1024*1024)) file
      3. fallocate -o 0 -l $((1024*1024)) file
      4. fibmap.f2fs file
      
      Our result of fibmap.f2fs showed below is not correct:
      
      file_pos   start_blk     end_blk        blks
             0    -937166132    -937166132           1
          4096    -937166132    -937166132           1
          8192    -937166132    -937166132           1
         12288    -937166132    -937166132           1
         16384    -937166132    -937166132           1
         20480    -937166132    -937166132           1
      ...
       1040384    -937166132    -937166132           1
       1044480    -937166132    -937166132           1
      
      This is because f2fs_map_blocks will return with no error when meeting
      a hole or preallocated block, the caller __get_data_block will map the
      uninitialized variable value to bh->b_blocknr.
      
      Unfortunately generic_block_bmap will neither check the return value of
      get_data() nor check mapping info of buffer_head, result in returning
      the random block address.
      
      After fixing the issue, our result shows correctly:
      
      file_pos   start_blk     end_blk        blks
             0           0           0         256
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e2b4e2bc
    • Chao Yu's avatar
      f2fs: add annotation for space utilization of regular/inline dentry · c031f6a9
      Chao Yu authored
      Add annotation to let us know more clearly about space utilization
      information of regular dentry and inline dentry.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c031f6a9
    • Fan Li's avatar
      f2fs: fix to update cached_en of extent tree properly · f8b703da
      Fan Li authored
      In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
      read lock held,
      it could cause __lookup_extent_tree within return entirely wrong
      extent_node, if other
      thread update et->cached_en just before __lookup_extent_tree return.
      
      However, there are two things about this patch that need to be noticed:
      1. It does no good to arrange the order of concurrent read/write, the result
      would still
      be random in such case.
      2. It's built on this assumption: the mix up of reads and writes on a single
      pointer would
      not make the pointer partially wrong at any time. Please let me know if I'm
      wrong, thx.
      Signed-off-by: default avatarFan li <fanofcode.li@samsung.com>
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f8b703da
    • Junesung Lee's avatar
      f2fs: fix typo · 217940d4
      Junesung Lee authored
      Fix typo.
      Signed-off-by: default avatarJunesung Lee <junesoung412@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      217940d4
  4. 20 Aug, 2015 13 commits
  5. 14 Aug, 2015 3 commits
  6. 11 Aug, 2015 2 commits
    • Chao Yu's avatar
      f2fs: remove inmem radix tree · decd36b6
      Chao Yu authored
      Previously, we use radix tree to index all registered page entries for
      atomic file, but now we only use radix tree to see whether current page
      is indexed or not, since the other user of radix tree is gone in commit
      042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").
      
      So in this patch, we try to use one more efficient way:
      Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
      value to indicate page indexing status. By using this way, we can save
      memory and lookup time.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      decd36b6
    • Chao Yu's avatar
      f2fs: report EINVAL for unalignment direct IO · c15e8599
      Chao Yu authored
      We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
      detail is as fallow:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432278894
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TFAIL  :  diotest4.c:129: write allows odd count.returns 1: Success
      diotest4    4  TFAIL  :  diotest4.c:183: Odd count of read and write
      diotest4    5  TPASS  :  Read beyond the file size
      ......
      
      the result of ext4 with same environment:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432259643
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TPASS  :  Odd count of read and write
      diotest4    4  TPASS  :  Read beyond the file size
      ......
      
      The reason is that when triggering DIO in f2fs, we will return zero value
      in ->direct_IO if writer's buffer offset, file offset and transfer size is
      not alignment to block size of filesystem, resulting in falling back into
      buffered write instead of returning -EINVAL.
      
      This patch fixes that problem by returning correct error number for above
      case, and removing the judgement condition in check_direct_IO to make sure
      the verification will be enabled for direct reader too.
      
      Besides, Jaegeuk Kim pointed out that there is expectional cases we should
      always make direct-io falling back into buffered write, such as dio in
      encrypted file.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      [Chao Yu make small change and add detail description in commit message]
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c15e8599
  7. 10 Aug, 2015 1 commit
  8. 06 Aug, 2015 2 commits
    • Chao Yu's avatar
      f2fs: recover invalid/reserved block address for fsynced file · 12a8343e
      Chao Yu authored
      When testing with generic/101 in xfstests, error message outputed as below:
      
          --- tests/generic/101.out
          +++ results//generic/101.out.bad
          @@ -10,10 +10,14 @@
           File foo content after log replay:
           0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
           *
          -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
           *
           0372000
          ...
          (Run 'diff -u tests/generic/101.out results/generic/101.out.bad'  to see the entire diff)
      
      The test flow is like below:
      1. pwrite foo -S 0xaa 0 64K
      2. pwrite foo -S 0xbb 64K 61K
      3. sync
      4. truncate foo 64K
      5. truncate foo 125K
      6. fsync foo
      7. flakey drop writes
      8. umount
      
      After this test, we expect the data of recovered file will have the first
      64k of data filling with value 0xaa and the next 61k of data filling with
      value 0x00 because we have fsynced it before dropping writes in dm.
      
      In f2fs, during recovering, we will only recover the valid block address
      in direct node page if it is marked as a fsynced dnode, but block address
      which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
      recovered. So, the file recovered shows its incorrect data 0xbb in range of
      [61k, 125k].
      
      In this patch, we fix to recover invalid/reserved block during recover flow.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      12a8343e
    • Fan Li's avatar
      f2fs: use extent cache to optimize f2fs_reserve_block · 759af1c9
      Fan Li authored
      In some cases, we only need the block address when we call
      f2fs_reserve_block,
      other fields of struct dnode_of_data aren't necessary.
      We can try extent cache first for such cases in order to speed up the
      process.
      Signed-off-by: default avatarFan li <fanofcode.li@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      759af1c9
  9. 05 Aug, 2015 3 commits
    • Chao Yu's avatar
      f2fs: invalidate temporary meta page · e90c2d28
      Chao Yu authored
      To avoid meeting garbage data in next free node block at the end of warm
      node chain when doing recovery, we will try to zero out that invalid block.
      
      If the device is not support discard, our way for zeroing out block is:
      grabbing a temporary zeroed page in meta inode, then, issue write request
      with this page.
      
      But, we forget to release that temporary page, so our memory usage will
      increase without gaining any hit ratio benefit, so it's better to free it
      for saving memory.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e90c2d28
    • Chao Yu's avatar
      f2fs: fix to release inode page correctly · 470f00e9
      Chao Yu authored
      In following call path, we will pass a locked and referenced ipage
      pointer to get_new_data_page:
       - init_inode_metadata
        - make_empty_dir
         - get_new_data_page
      
      There are two exit paths in get_new_data_page when error occurs:
      1) grab_cache_page fails, ipage will not be released;
      2) f2fs_reserve_block fails, ipage will be released in callee.
      
      So, it's not consistent for error handling in get_new_data_page.
      
      For f2fs_reserve_block, it's not very easy to change the rule
      of error handling, since it's already complicated.
      
      Here we deside to choose an easy way to fix this issue:
      If any error occur in get_new_data_page, we will ensure releasing
      ipage in this function.
      
      The same issue is in f2fs_convert_inline_dir, fix that too.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      470f00e9
    • Liu Xue's avatar
      f2fs: unify f2fs_bug_on when check blocks and segment · 7a04f64d
      Liu Xue authored
      Replace BUG_ON with f2fs_bug_on to deal with
      block and segment validity check failed.
      Signed-off-by: default avatarXue Liu <liuxueliu.liu@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7a04f64d