1. 14 Aug, 2015 1 commit
  2. 11 Aug, 2015 2 commits
    • Chao Yu's avatar
      f2fs: remove inmem radix tree · decd36b6
      Chao Yu authored
      Previously, we use radix tree to index all registered page entries for
      atomic file, but now we only use radix tree to see whether current page
      is indexed or not, since the other user of radix tree is gone in commit
      042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").
      
      So in this patch, we try to use one more efficient way:
      Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
      value to indicate page indexing status. By using this way, we can save
      memory and lookup time.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      decd36b6
    • Chao Yu's avatar
      f2fs: report EINVAL for unalignment direct IO · c15e8599
      Chao Yu authored
      We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
      detail is as fallow:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432278894
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TFAIL  :  diotest4.c:129: write allows odd count.returns 1: Success
      diotest4    4  TFAIL  :  diotest4.c:183: Odd count of read and write
      diotest4    5  TPASS  :  Read beyond the file size
      ......
      
      the result of ext4 with same environment:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432259643
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TPASS  :  Odd count of read and write
      diotest4    4  TPASS  :  Read beyond the file size
      ......
      
      The reason is that when triggering DIO in f2fs, we will return zero value
      in ->direct_IO if writer's buffer offset, file offset and transfer size is
      not alignment to block size of filesystem, resulting in falling back into
      buffered write instead of returning -EINVAL.
      
      This patch fixes that problem by returning correct error number for above
      case, and removing the judgement condition in check_direct_IO to make sure
      the verification will be enabled for direct reader too.
      
      Besides, Jaegeuk Kim pointed out that there is expectional cases we should
      always make direct-io falling back into buffered write, such as dio in
      encrypted file.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      [Chao Yu make small change and add detail description in commit message]
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c15e8599
  3. 10 Aug, 2015 1 commit
  4. 06 Aug, 2015 2 commits
    • Chao Yu's avatar
      f2fs: recover invalid/reserved block address for fsynced file · 12a8343e
      Chao Yu authored
      When testing with generic/101 in xfstests, error message outputed as below:
      
          --- tests/generic/101.out
          +++ results//generic/101.out.bad
          @@ -10,10 +10,14 @@
           File foo content after log replay:
           0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
           *
          -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
           *
           0372000
          ...
          (Run 'diff -u tests/generic/101.out results/generic/101.out.bad'  to see the entire diff)
      
      The test flow is like below:
      1. pwrite foo -S 0xaa 0 64K
      2. pwrite foo -S 0xbb 64K 61K
      3. sync
      4. truncate foo 64K
      5. truncate foo 125K
      6. fsync foo
      7. flakey drop writes
      8. umount
      
      After this test, we expect the data of recovered file will have the first
      64k of data filling with value 0xaa and the next 61k of data filling with
      value 0x00 because we have fsynced it before dropping writes in dm.
      
      In f2fs, during recovering, we will only recover the valid block address
      in direct node page if it is marked as a fsynced dnode, but block address
      which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
      recovered. So, the file recovered shows its incorrect data 0xbb in range of
      [61k, 125k].
      
      In this patch, we fix to recover invalid/reserved block during recover flow.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      12a8343e
    • Fan Li's avatar
      f2fs: use extent cache to optimize f2fs_reserve_block · 759af1c9
      Fan Li authored
      In some cases, we only need the block address when we call
      f2fs_reserve_block,
      other fields of struct dnode_of_data aren't necessary.
      We can try extent cache first for such cases in order to speed up the
      process.
      Signed-off-by: default avatarFan li <fanofcode.li@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      759af1c9
  5. 05 Aug, 2015 19 commits
  6. 04 Aug, 2015 15 commits
    • Chao Yu's avatar
      f2fs: expose f2fs_write_cache_pages · 8f46dcae
      Chao Yu authored
      If there are gced dirty pages and normal dirty pages in the mapping
      of one inode, we might writeback them alternately with discontinuous
      block address, resulting in low performance.
      
      This patch introduces f2fs_write_cache_pages with codes copied from
      write_cache_pages in mm/page-writeback.c.
      
      In this function, we refactor flow with two steps:
      1) writeback all cold type pages.
      2) writeback all non-cold type pages.
      
      By using this method, f2fs will writeback dirty pages with the same
      temperature in bunch mode, it makes writeouted block being with
      more continuous address, so they can be merged as much as possible
      in f2fs bio cache, and also it will reduce the chance of submiting
      small IO from block layer.
      
      Test environment: 8g nokia sd card (very old sd card, but it shows
      better effect when testing with this patch, and with a 32g kingston
      sd card, I didn't see much more improvement).
      
      Test step:
      1. touch testfile;
      2. truncate -s 512K testfile;
      3. write all pages with odd index;
      4. trigger gc by ioctl;
      5. write all pages with even index;
      6. time fsync testfile.
      
      before:
      real	0m0.402s
      user	0m0.000s
      sys	0m0.000s
      
      after:
      real	0m0.143s
      user	0m0.004s
      sys	0m0.004s
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8f46dcae
    • Chao Yu's avatar
      f2fs: correct return value of ->setxattr · 037fe70c
      Chao Yu authored
      This patch fixes to return correct error number of ->setxattr, which
      is reported by xfstest tests/generic/026 as below:
      
      generic/026      - output mismatch
          --- tests/generic/026.out
          +++ results/generic/026.out.bad
          @@ -4,6 +4,6 @@
           1 below acl max
           acl max
           1 above acl max
          -chacl: cannot set access acl on "largeaclfile": Argument list too long
          +chacl: cannot set access acl on "largeaclfile": Numerical result out of range
           use 16 aces
           use 17 aces
          ...
      Ran: generic/026
      Failures: generic/026
      Failed 1 of 1 tests
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      037fe70c
    • Chao Yu's avatar
      f2fs: cleanup write_orphan_inodes · bd936f84
      Chao Yu authored
      Previously, since 'commit 4531929e ("f2fs: move grabing orphan
      pages out of protection region")' was committed, in write_orphan_inodes(),
      we will grab all meta page in a batch before we use them under spinlock,
      so that we can avoid large time delay of grabbing meta pages under
      spinlock.
      
      Now, 'commit d6c67a4f ("f2fs: revmove spin_lock for
      write_orphan_inodes")' remove the spinlock in write_orphan_inodes,
      so there is no issue we describe above, we'd better recover to move
      the grab operation to original place for readability.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bd936f84
    • Chao Yu's avatar
      f2fs: warm up cold page after mmaped write · 5b339124
      Chao Yu authored
      With cost-benifit method, background gc will consider old section with
      fewer valid blocks as candidate victim, these old blocks in section will
      be treated as cold data, and laterly will be moved into cold segment.
      
      But if the gcing page is attached by user through buffered or mmaped
      write, we should reset the page as non-cold one, because this page may
      have more opportunity for further updating.
      
      So fix to add clearing code for the missed 'mmap' case.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5b339124
    • Chao Yu's avatar
      f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT · c1c1b583
      Chao Yu authored
      When background gc is off, the only way to trigger gc is executing
      a force gc in some operations who wants to grab space in disk.
      
      The executing condition is limited: to execute force gc, we should
      wait for the time when there is almost no more free section for LFS
      allocation. This seems not reasonable for our user who wants to
      control triggering gc by himself.
      
      This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for
      triggering garbage collection by using ioctl. It provides our users
      one more option to trigger gc.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c1c1b583
    • Chao Yu's avatar
      f2fs: maintain extent cache in separated file · a28ef1f5
      Chao Yu authored
      This patch moves extent cache related code from data.c into extent_cache.c
      since extent cache is independent feature, and its codes are not relate to
      others in data.c, it's better for us to maintain them in separated place.
      
      There is no functionality change, but several small coding style fixes
      including:
      * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
      * rename misspelled word 'untill' to 'until';
      * remove unneeded 'return' in the end of f2fs_destroy_extent_tree().
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a28ef1f5
    • Fan Li's avatar
      f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN · 3c7df87d
      Fan Li authored
      Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
      be kept in extent cache after split, extents already shorter than
      F2FS_MIN_EXTENT_LEN don't need to try split at all.
      Signed-off-by: default avatarFan Li <fanofcode.li@samsung.com>
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3c7df87d
    • Chao Yu's avatar
      f2fs: fix to update page flag · 90d4388a
      Chao Yu authored
      This patch fixes to update page flag (e.g. Uptodate/cold flag) in
      ->write_begin.
      
      Otherwise, page will be non-uptodate when we try to write entire
      page, and cold data flag in page will not be clean when gced page
      is being rewritten.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      90d4388a
    • Jaegeuk Kim's avatar
      f2fs: shrink unreferenced extent_caches first · 7023a1ad
      Jaegeuk Kim authored
      If an extent_tree entry has a zero reference count, we can drop it from the
      cache in higher priority rather than currently referencing entries.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7023a1ad
    • Chao Yu's avatar
      f2fs: enhance multithread performance · bb96a8d5
      Chao Yu authored
      In ->writepages, we use writepages mutex lock to serialize all block
      address allocation and page submitting pairs from different inodes.
      This method makes our delayed dirty pages of one inode being written
      continously as many as possible.
      
      But there is one problem that we did not submit current cached bio in
      protection region of writepages mutex lock, so there is a small chance
      that we submit the one of other thread's as below, resulting in
      splitting more bios.
      
      thread 1			thread 2
      ->writepages
        lock(writepages)
        ->write_cache_pages
        unlock(writepages)
      				  lock(writepages)
      				  ->write_cache_pages
        ->f2fs_submit_merged_bio
      				    ->writepage
      				  unlock(writepages)
      
      fs_mark-6535  [002] ....  2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288
      fs_mark-6536  [000] ....  2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096
      fs_mark-6536  [000] ....  2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096
      fs_mark-6535  [002] ....  2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096
      
      This may really increase time of block layer works, and may cause
      larger IO lantency.
      
      This patch moves the submitting operation into region of writepages
      mutex lock to avoid bio splits when concurrently writebacking is
      intensive.
      
      my test environment: virtual machine,
      intel cpu i5 2500, 8GB size memory, 4GB size ramdisk
      
      time fs_mark  -t  16  -L  1  -s  524288  -S  1  -d  /mnt/f2fs/
      
      before:
      real	0m4.244s
      user	0m0.088s
      sys	0m12.336s
      
      after:
      real	0m3.822s
      user	0m0.072s
      sys	0m10.760s
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bb96a8d5
    • Chao Yu's avatar
      f2fs: restrict multimedia filename · 741a7bea
      Chao Yu authored
      When testing with fs_mark, some blocks were written out as cold
      data which were mixed with warm data, resulting in splitting more
      bios.
      
      This is because fs_mark will create file with random filename as
      below:
      
      559551ee~~~~~~~~15Z29OCC05JCKQP60JQ42MKV
      559551ee~~~~~~~~NZAZ6X8OA8LHIIP6XD0L58RM
      559551ef~~~~~~~~B15YDSWAK789HPSDZKYTW6WM
      559551f1~~~~~~~~2DAE5DPS79785BUNTFWBEMP3
      559551f1~~~~~~~~1MYDY0BKSQCJPI32Q8C514RM
      559551f1~~~~~~~~YQOTMAOMN5CVRFOUNI026MP4
      559551f3~~~~~~~~1WF42LPRTQJNPPGR3EINKMPE
      559551f3~~~~~~~~8Y2NRK7CEPPAA02LY936PJPG
      
      They are regarded as cold file since their filename are ended with
      multimedia files' extension, but this should be wrong as we only
      match the extension of filename, not the whole one.
      
      In this patch, we try to fix the format of multimedia filename to:
      "filename + '.' + extension", then we set cold file only its
      filename matches the format.
      
      So after this change, it will reduce the probability we set the
      wrong cold file, also it helps a little for fs_mark's performance
      on f2fs.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      741a7bea
    • Chao Yu's avatar
      MAINTAINERS: add missed trace file for f2fs · 62d43eeb
      Chao Yu authored
      This patch adds missed trace file in maintainer-ship of f2fs,
      so it completes the description of files maintained in f2fs,
      and also it allows people to find correct mailing list by using
      get_maintainer.pl when only patching the trace file of f2fs.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      62d43eeb
    • Nicholas Krause's avatar
      f2fs: make the function check_dnode have a return type of bool and change it's name to is_alive · c1079892
      Nicholas Krause authored
      This makes the function check_dnode have a return type of bool
      due to this particular function only ever returning either one
      or zero as its return value and changes the name of the function
      to is_alive in order to better explain this function's intended
      work of checking if a dnode is still in use by the filesystem.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      [Jaegeuk Kim: change the return value check for the renamed function]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c1079892
    • Jaegeuk Kim's avatar
      f2fs: check the largest extent at look-up time · 84bc926c
      Jaegeuk Kim authored
      Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
      that the largest extent would be cached in the tree all the time.
      
      Instead of relying on extent_tree, we can simply check the cached one in extent
      tree accordingly.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      84bc926c
    • Jaegeuk Kim's avatar
      f2fs: use extent_cache by default · 3e72f721
      Jaegeuk Kim authored
      We don't need to handle the duplicate extent information.
      
      The integrated rule is:
       - update on-disk extent with largest one tracked by in-memory extent_cache
       - destroy extent_tree for the truncation case
       - drop per-inode extent_cache by shrinker
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3e72f721