1. 11 May, 2018 1 commit
    • Al Viro's avatar
      do d_instantiate/unlock_new_inode combinations safely · 1e2e547a
      Al Viro authored
      
      For anything NFS-exported we do _not_ want to unlock new inode
      before it has grown an alias; original set of fixes got the
      ordering right, but missed the nasty complication in case of
      lockdep being enabled - unlock_new_inode() does
      	lockdep_annotate_inode_mutex_key(inode)
      which can only be done before anyone gets a chance to touch
      ->i_mutex.  Unfortunately, flipping the order and doing
      unlock_new_inode() before d_instantiate() opens a window when
      mkdir can race with open-by-fhandle on a guessed fhandle, leading
      to multiple aliases for a directory inode and all the breakage
      that follows from that.
      
      	Correct solution: a new primitive (d_instantiate_new())
      combining these two in the right order - lockdep annotate, then
      d_instantiate(), then the rest of unlock_new_inode().  All
      combinations of d_instantiate() with unlock_new_inode() should
      be converted to that.
      
      Cc: stable@kernel.org	# 2.6.29 and later
      Tested-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1e2e547a
  2. 17 Mar, 2018 3 commits
  3. 12 Mar, 2018 3 commits
    • Chao Yu's avatar
      f2fs: support hot file extension · b6a06cbb
      Chao Yu authored
      
      This patch supports to recognize hot file extension in f2fs, so that we
      can allocate proper hot segment location for its data, which can lead to
      better hot/cold seperation in filesystem.
      
      In addition, we changes a bit on query/add/del operation method for
      extension_list sysfs entry as below:
      
      - Query: cat /sys/fs/f2fs/<disk>/extension_list
      - Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
      - Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
      - Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
      - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
      - [h] means add/del hot file extension
      - [c] means add/del cold file extension
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b6a06cbb
    • Chao Yu's avatar
      f2fs: expose extension_list sysfs entry · 846ae671
      Chao Yu authored
      
      This patch adds a sysfs entry 'extension_list' to support
      query/add/del item in extension list.
      
      Query:
      cat /sys/fs/f2fs/<device>/extension_list
      
      Add:
      echo 'extension' > /sys/fs/f2fs/<device>/extension_list
      
      Del:
      echo '!extension' > /sys/fs/f2fs/<device>/extension_list
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      846ae671
    • Yunlong Song's avatar
      f2fs: don't put dentry page in pagecache into highmem · bdbc90fa
      Yunlong Song authored
      
      Previous dentry page uses highmem, which will cause panic in platforms
      using highmem (such as arm), since the address space of dentry pages
      from highmem directly goes into the decryption path via the function
      fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
      from highmem, and then cause panic since it doesn't call kmap_high but
      kunmap_high is triggered at the end. To fix this problem in a simple
      way, this patch avoids to put dentry page in pagecache into highmem.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix coding style]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bdbc90fa
  4. 25 Jan, 2018 1 commit
  5. 19 Jan, 2018 1 commit
    • Daeho Jeong's avatar
      f2fs: prevent newly created inode from being dirtied incorrectly · 9ac1e2d8
      Daeho Jeong authored
      
      Now, we invoke f2fs_mark_inode_dirty_sync() to make an inode dirty in
      advance of creating a new node page for the inode. By this, some inodes
      whose node page is not created yet can be linked into the global dirty
      list.
      
      If the checkpoint is executed at this moment, the inode will be written
      back by writeback_single_inode() and finally update_inode_page() will
      fail to detach the inode from the global dirty list because the inode
      doesn't have a node page.
      
      The problem is that the inode's state in VFS layer will become clean
      after execution of writeback_single_inode() and it's still linked in
      the global dirty list of f2fs and this will cause a kernel panic.
      
      So, we will prevent the newly created inode from being dirtied during
      the FI_NEW_INODE flag of the inode is set. We will make it dirty
      right after the flag is cleared.
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarYoungjin Gil <youngjin.gil@samsung.com>
      Tested-by: default avatarHobin Woo <hobin.woo@samsung.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9ac1e2d8
  6. 12 Jan, 2018 2 commits
  7. 03 Jan, 2018 5 commits
  8. 06 Nov, 2017 3 commits
    • Chao Yu's avatar
      f2fs: support flexible inline xattr size · 6afc662e
      Chao Yu authored
      
      Now, in product, more and more features based on file encryption were
      introduced, their demand of xattr space is increasing, however, inline
      xattr has fixed-size of 200 bytes, once inline xattr space is full, new
      increased xattr data would occupy additional xattr block which may bring
      us more space usage and performance regression during persisting.
      
      In order to resolve above issue, it's better to expand inline xattr size
      flexibly according to user's requirement.
      
      So this patch introduces new filesystem feature 'flexible inline xattr',
      and new mount option 'inline_xattr_size=%u', once mkfs enables the
      feature, we can use the option to make f2fs supporting flexible inline
      xattr size.
      
      To support this feature, we add extra attribute i_inline_xattr_size in
      inode layout, indicating that how many space inline xattr borrows from
      block address mapping space in inode layout, by this, we can easily
      locate and store flexible-sized inline xattr data in inode.
      
      Inode disk layout:
        +----------------------+
        | .i_mode              |
        | ...                  |
        | .i_ext               |
        +----------------------+
        | .i_extra_isize       |
        | .i_inline_xattr_size |-----------+
        | ...                  |           |
        +----------------------+           |
        | .i_addr              |           |
        |  - block address or  |           |
        |  - inline data       |           |
        +----------------------+<---+      v
        |    inline xattr      |    +---inline xattr range
        +----------------------+<---+
        | .i_nid               |
        +----------------------+
        |   node_footer        |
        | (nid, ino, offset)   |
        +----------------------+
      
      Note that, we have to cnosider backward compatibility which reserved
      inline_data space, 200 bytes, all the time, reported by Sheng Yong.
      
      Previous inline data or directory always reserved 200 bytes in inode layout,
      even if inline_xattr is disabled. In order to keep inline_dentry's structure
      for backward compatibility, we get the space back only from inline_data.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Reported-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6afc662e
    • Jaegeuk Kim's avatar
      f2fs: add missing quota_initialize · d8d1389e
      Jaegeuk Kim authored
      
      This patch adds to call quota_intialize in f2fs_set_acl, f2fs_unlink,
      and f2fs_rename.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d8d1389e
    • Jaegeuk Kim's avatar
      f2fs: stop all the operations by cp_error flag · 1f227a3e
      Jaegeuk Kim authored
      
      This patch replaces to use cp_error flag instead of RDONLY for quota off.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1f227a3e
  9. 26 Oct, 2017 1 commit
  10. 05 Sep, 2017 1 commit
  11. 31 Jul, 2017 3 commits
    • Chao Yu's avatar
      f2fs: support project quota · 5c57132e
      Chao Yu authored
      
      This patch adds to support plain project quota.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5c57132e
    • Chao Yu's avatar
      f2fs: record quota during dot{,dot} recovery · a6d3a479
      Chao Yu authored
      
      In ->lookup(), we will have a try to recover dot or dotdot for
      corrupted directory, once disk quota is on, if it allocates new
      block during dotdot recovery, we need to record disk quota info
      for the allocation, so this patch fixes this issue by adding
      missing dquot_initialize() in __recover_dot_dentries.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a6d3a479
    • Chao Yu's avatar
      f2fs: enhance on-disk inode structure scalability · 7a2af766
      Chao Yu authored
      
      This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
      to indicate that on-disk structure of current inode is extended.
      
      In order to extend, we changed the inode structure a bit:
      
      Original one:
      
      struct f2fs_inode {
      	...
      	struct f2fs_extent i_ext;
      	__le32 i_addr[DEF_ADDRS_PER_INODE];
      	__le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Extended one:
      
      struct f2fs_inode {
              ...
              struct f2fs_extent i_ext;
      	union {
      		struct {
      			__le16 i_extra_isize;
      			__le16 i_padding;
      			__le32 i_extra_end[0];
      		};
      		__le32 i_addr[DEF_ADDRS_PER_INODE];
      	};
              __le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
      i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
      we can calculate actual size of reserved space in i_addr, available
      attribute fields included in total extra attribute fields for current
      inode can be described as below:
      
        +--------------------+
        | .i_mode            |
        | ...                |
        | .i_ext             |
        +--------------------+
        | .i_extra_isize     |-----+
        | .i_padding         |     |
        | .i_prjid           |     |
        | .i_atime_extra     |     |
        | .i_ctime_extra     |     |
        | .i_mtime_extra     |<----+
        | .i_inode_cs        |<----- store blkaddr/inline from here
        | .i_xattr_cs        |
        | ...                |
        +--------------------+
        |                    |
        |    block address   |
        |                    |
        +--------------------+
        | .i_nid             |
        +--------------------+
        |   node_footer      |
        | (nid, ino, offset) |
        +--------------------+
      
      Hence, with this patch, we would enhance scalability of f2fs inode for
      storing more newly added attribute.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7a2af766
  12. 09 Jul, 2017 1 commit
    • Chao Yu's avatar
      f2fs: support plain user/group quota · 0abd675e
      Chao Yu authored
      
      This patch adds to support plain user/group quota.
      
      Change Note by Jaegeuk Kim.
      
      - Use f2fs page cache for quota files in order to consider garbage collection.
        so, quota files are not tolerable for sudden power-cuts, so user needs to do
        quotacheck.
      
      - setattr() calls dquot_transfer which will transfer inode->i_blocks.
        We can't reclaim that during f2fs_evict_inode(). So, we need to count
        node blocks as well in order to match i_blocks with dquot's space.
      
        Note that, Chao wrote a patch to count inode->i_blocks without inode block.
        (f2fs: don't count inode block in in-memory inode.i_blocks)
      
      - in f2fs_remount, we need to make RW in prior to dquot_resume.
      
      - handle fault_injection case during f2fs_quota_off_umount
      
      - TODO: Project quota
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0abd675e
  13. 07 Jul, 2017 1 commit
    • Sheng Yong's avatar
      f2fs: do not set LOST_PINO for renamed dir · b855bf0e
      Sheng Yong authored
      
      After renaming a directory, fsck could detect unmatched pino. The scenario
      can be reproduced as the following:
      
      	$ mkdir /bar/subbar /foo
      	$ rename /bar/subbar /foo
      
      Then fsck will report:
      [ASSERT] (__chk_dots_dentries:1182)  --> Bad inode number[0x3] for '..', parent parent ino is [0x4]
      
      Rename sets LOST_PINO for old_inode. However, the flag cannot be cleared,
      since dir is written back with CP. So, let's get rid of LOST_PINO for a
      renamed dir and fix the pino directly at the end of rename.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b855bf0e
  14. 04 May, 2017 1 commit
  15. 12 Apr, 2017 1 commit
    • Jaegeuk Kim's avatar
      f2fs: fix fs corruption due to zero inode page · 9bb02c36
      Jaegeuk Kim authored
      
      This patch fixes the following scenario.
      
      - f2fs_create/f2fs_mkdir             - write_checkpoint
       - f2fs_mark_inode_dirty_sync         - block_operations
                                             - f2fs_lock_all
                                             - f2fs_sync_inode_meta
                                              - f2fs_unlock_all
                                              - sync_inode_metadata
       - f2fs_lock_op
                                               - f2fs_write_inode
                                                - update_inode_page
                                                 - get_node_page
                                                   return -ENOENT
       - new_inode_page
        - fill_node_footer
       - f2fs_mark_inode_dirty_sync
       - ...
       - f2fs_unlock_op
                                                - f2fs_inode_synced
                                             - f2fs_lock_all
                                             - do_checkpoint
      
      In this checkpoint, we can get an inode page which contains zeros having valid
      node footer only.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9bb02c36
  16. 22 Mar, 2017 1 commit
    • Kinglong Mee's avatar
      f2fs: cleanup the disk level filename updating · d03ba4cc
      Kinglong Mee authored
      
      As discuss with Jaegeuk and Chao,
      "Once checkpoint is done, f2fs doesn't need to update there-in filename at all."
      
      The disk-level filename is used only one case,
      1. create a file A under a dir
      2. sync A
      3. godown
      4. umount
      5. mount (roll_forward)
      
      Only the rename/cross_rename changes the filename, if it happens,
      a. between step 1 and 2, the sync A will caused checkpoint, so that,
         the roll_forward at step 5 never happens.
      b. after step 2, the roll_forward happens, file A will roll forward
         to the result as after step 1.
      
      So that, any updating the disk filename is useless, just cleanup it.
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d03ba4cc
  17. 21 Mar, 2017 1 commit
  18. 29 Jan, 2017 2 commits
  19. 31 Dec, 2016 1 commit
    • Eric Biggers's avatar
      fscrypt: use ENOKEY when file cannot be created w/o key · 54475f53
      Eric Biggers authored
      
      As part of an effort to clean up fscrypt-related error codes, make
      attempting to create a file in an encrypted directory that hasn't been
      "unlocked" fail with ENOKEY.  Previously, several error codes were used
      for this case, including ENOENT, EACCES, and EPERM, and they were not
      consistent between and within filesystems.  ENOKEY is a better choice
      because it expresses that the failure is due to lacking the encryption
      key.  It also matches the error code returned when trying to open an
      encrypted regular file without the key.
      
      I am not aware of any users who might be relying on the previous
      inconsistent error codes, which were never documented anywhere.
      
      This failure case will be exercised by an xfstest.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      54475f53
  20. 09 Dec, 2016 1 commit
  21. 23 Nov, 2016 1 commit
  22. 08 Oct, 2016 1 commit
  23. 01 Oct, 2016 1 commit
  24. 28 Sep, 2016 1 commit
  25. 27 Sep, 2016 1 commit
  26. 15 Sep, 2016 1 commit
    • Eric Biggers's avatar
      fscrypto: make filename crypto functions return 0 on success · ef1eb3aa
      Eric Biggers authored
      
      Several filename crypto functions: fname_decrypt(),
      fscrypt_fname_disk_to_usr(), and fscrypt_fname_usr_to_disk(), returned
      the output length on success or -errno on failure.  However, the output
      length was redundant with the value written to 'oname->len'.  It is also
      potentially error-prone to make callers have to check for '< 0' instead
      of '!= 0'.
      
      Therefore, make these functions return 0 instead of a length, and make
      the callers who cared about the return value being a length use
      'oname->len' instead.  For consistency also make other callers check for
      a nonzero result rather than a negative result.
      
      This change also fixes the inconsistency of fname_encrypt() actually
      already returning 0 on success, not a length like the other filename
      crypto functions and as documented in its function comment.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Acked-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ef1eb3aa