1. 06 Jul, 2017 2 commits
  2. 05 Jul, 2017 1 commit
    • Alexander Potapenko's avatar
      fs: generic_block_bmap(): initialize all of the fields in the temp bh · 2a527d68
      Alexander Potapenko authored
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports the
      use of uninitialized memory in ext4_update_bh_state():
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 3 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0-rc6+ #597
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008
       ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8
       ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f30856>] dump_stack+0xa6/0xc0 lib/dump_stack.c:51
       [<ffffffff812fc1fc>] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:?
       [<ffffffff812fc33b>] __msan_warning+0x2b/0x40 ??:?
       [<     inline     >] ext4_update_bh_state fs/ext4/inode.c:727
       [<ffffffff8169742a>] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759
       [<ffffffff81696d4c>] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769
       [<ffffffff814a2d36>] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991
       [<ffffffff816ca30e>] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177
      ...
      origin description: ----tmp@generic_block_bmap
      ==================================================================
      
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      The local |tmp| is created in generic_block_bmap() and then passed into
      ext4_bmap() => ext4_get_block() => _ext4_get_block() =>
      ext4_update_bh_state(). Along the way tmp.b_page is never initialized
      before ext4_update_bh_state() checks its value.
      
      [ Use the approach suggested by Kees Cook of initializing the whole bh
        structure.]
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      2a527d68
  3. 04 Jul, 2017 1 commit
    • Tahsin Erdogan's avatar
      ext4: change fast symlink test to not rely on i_blocks · 407cd7fb
      Tahsin Erdogan authored
      ext4_inode_info->i_data is the storage area for 4 types of data:
      
        a) Extents data
        b) Inline data
        c) Block map
        d) Fast symlink data (symlink length < 60)
      
      Extents data case is positively identified by EXT4_INODE_EXTENTS flag.
      Inline data case is also obvious because of EXT4_INODE_INLINE_DATA
      flag.
      
      Distinguishing c) and d) however requires additional logic. This
      currently relies on i_blocks count. After subtracting external xattr
      block from i_blocks, if it is greater than 0 then we know that some
      data blocks exist, so there must be a block map.
      
      This logic got broken after ea_inode feature was added. That feature
      charges the data blocks of external xattr inodes to the referencing
      inode and so adds them to the i_blocks. To fix this, we could subtract
      ea_inode blocks by iterating through all xattr entries and then check
      whether remaining i_blocks count is zero. Besides being complicated,
      this won't change the fact that the current way of distinguishing
      between c) and d) is fragile.
      
      The alternative solution is to test whether i_size is less than 60 to
      determine fast symlink case. ext4_symlink() uses the same test to decide
      whether to store the symlink in i_data. There is one caveat to address
      before this can work though.
      
      If an inode's i_nlink is zero during eviction, its i_size is set to
      zero and its data is truncated. If system crashes before inode is removed
      from the orphan list, next boot orphan cleanup may find the inode with
      zero i_size. So, a symlink that had its data stored in a block may now
      appear to be a fast symlink. The solution used in this patch is to treat
      i_size = 0 as a non-fast symlink case. A zero sized symlink is not legal
      so the only time this can happen is the mentioned scenario. This is also
      logically correct because a i_size = 0 symlink has no data stored in
      i_data.
      Suggested-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      407cd7fb
  4. 23 Jun, 2017 9 commits
    • Eric Biggers's avatar
      ext4: require key for truncate(2) of encrypted file · 63136858
      Eric Biggers authored
      Currently, filesystems allow truncate(2) on an encrypted file without
      the encryption key.  However, it's impossible to correctly handle the
      case where the size being truncated to is not a multiple of the
      filesystem block size, because that would require decrypting the final
      block, zeroing the part beyond i_size, then encrypting the block.
      
      As other modifications to encrypted file contents are prohibited without
      the key, just prohibit truncate(2) as well, making it fail with ENOKEY.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      63136858
    • Eric Biggers's avatar
      ext4: don't bother checking for encryption key in ->mmap() · 66e0aaad
      Eric Biggers authored
      Since only an open file can be mmap'ed, and we only allow open()ing an
      encrypted file when its key is available, there is no need to check for
      the key again before permitting each mmap().
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      66e0aaad
    • Chao Yu's avatar
      ext4: check return value of kstrtoull correctly in reserved_clusters_store · 1ea1516f
      Chao Yu authored
      kstrtoull returns 0 on success, however, in reserved_clusters_store we
      will return -EINVAL if kstrtoull returns 0, it makes us fail to update
      reserved_clusters value through sysfs.
      
      Fixes: 76d33bca
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1ea1516f
    • Darrick J. Wong's avatar
      ext4: fix off-by-one fsmap error on 1k block filesystems · 4a495624
      Darrick J. Wong authored
      For 1k-block filesystems, the filesystem starts at block 1, not block 0.
      This fact is recorded in s_first_data_block, so use that to bump up the
      start_fsb before we start querying the filesystem for its space map.
      Without this, ext4/026 fails on 1k block ext4 because various functions
      (notably ext4_get_group_no_and_offset) don't know what to do with an
      fsblock that is "before" the start of the filesystem and return garbage
      results (blockgroup 2^32-1, etc.) that confuse fsmap.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4a495624
    • Theodore Ts'o's avatar
      ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry() · bdddf342
      Theodore Ts'o authored
      Previously a bad directory block with a bad checksum is skipped; we
      should be returning EFSBADCRC (aka EBADMSG).
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bdddf342
    • Khazhismel Kumykov's avatar
      ext4: return EIO on read error in ext4_find_entry · 6febe6f2
      Khazhismel Kumykov authored
      Previously, a read error would be ignored and we would eventually return
      NULL from ext4_find_entry, which signals "no such file or directory". We
      should be returning EIO.
      Signed-off-by: default avatarKhazhismel Kumykov <khazhy@google.com>
      6febe6f2
    • Eric Biggers's avatar
      ext4: forbid encrypting root directory · 9ce0151a
      Eric Biggers authored
      Currently it's possible to encrypt all files and directories on an ext4
      filesystem by deleting everything, including lost+found, then setting an
      encryption policy on the root directory.  However, this is incompatible
      with e2fsck because e2fsck expects to find, create, and/or write to
      lost+found and does not have access to any encryption keys.  Especially
      problematic is that if e2fsck can't find lost+found, it will create it
      without regard for whether the root directory is encrypted.  This is
      wrong for obvious reasons, and it causes a later run of e2fsck to
      consider the lost+found directory entry to be corrupted.
      
      Encrypting the root directory may also be of limited use because it is
      the "all-or-nothing" use case, for which dm-crypt can be used instead.
      (By design, encryption policies are inherited and cannot be overridden;
      so the root directory having an encryption policy implies that all files
      and directories on the filesystem have that same encryption policy.)
      
      In any case, encrypting the root directory is broken currently and must
      not be allowed; so start returning an error if userspace requests it.
      For now only do this in ext4, because f2fs and ubifs do not appear to
      have the lost+found requirement.  We could move it into
      fscrypt_ioctl_set_policy() later if desired, though.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      9ce0151a
    • Daeho Jeong's avatar
      ext4: send parallel discards on commit completions · a0154344
      Daeho Jeong authored
      Now, when we mount ext4 filesystem with '-o discard' option, we have to
      issue all the discard commands for the blocks to be deallocated and
      wait for the completion of the commands on the commit complete phase.
      Because this procedure might involve a lot of sequential combinations of
      issuing discard commands and waiting for that, the delay of this
      procedure might be too much long, even to 17.0s in our test,
      and it results in long commit delay and fsync() performance degradation.
      
      To reduce this kind of delay, instead of adding callback for each
      extent and handling all of them in a sequential manner on commit phase,
      we instead add a separate list of extents to free to the superblock and
      then process this list at once after transaction commits so that
      we can issue all the discard commands in a parallel manner like XFS
      filesystem.
      
      Finally, we could enhance the discard command handling performance.
      The result was such that 17.0s delay of a single commit in the worst
      case has been enhanced to 4.8s.
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Tested-by: default avatarHobin Woo <hobin.woo@samsung.com>
      Tested-by: default avatarKitae Lee <kitae87.lee@samsung.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      a0154344
    • Jan Kara's avatar
      ext4: avoid unnecessary stalls in ext4_evict_inode() · 3abb1a0f
      Jan Kara authored
      These days inode reclaim calls evict_inode() only when it has no pages
      in the mapping.  In that case it is not necessary to wait for transaction
      commit in ext4_evict_inode() as there can be no pages waiting to be
      committed.  So avoid unnecessary transaction waiting in that case.
      
      We still have to keep the check for the case where ext4_evict_inode()
      gets called from other paths (e.g. umount) where inode still can have
      some page cache pages.
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      3abb1a0f
  5. 22 Jun, 2017 27 commits