1. 02 Oct, 2014 1 commit
    • Jan Kara's avatar
      vfs: fix data corruption when blocksize < pagesize for mmaped data · 90a80202
      Jan Kara authored
      ->page_mkwrite() is used by filesystems to allocate blocks under a page
      which is becoming writeably mmapped in some process' address space. This
      allows a filesystem to return a page fault if there is not enough space
      available, user exceeds quota or similar problem happens, rather than
      silently discarding data later when writepage is called.
      
      However VFS fails to call ->page_mkwrite() in all the cases where
      filesystems need it when blocksize < pagesize. For example when
      blocksize = 1024, pagesize = 4096 the following is problematic:
        ftruncate(fd, 0);
        pwrite(fd, buf, 1024, 0);
        map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
        map[0] = 'a';       ----> page_mkwrite() for index 0 is called
        ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
        mremap(map, 1024, 10000, 0);
        map[4095] = 'a';    ----> no page_mkwrite() called
      
      At the moment ->page_mkwrite() is called, filesystem can allocate only
      one block for the page because i_size == 1024. Otherwise it would create
      blocks beyond i_size which is generally undesirable. But later at
      ->writepage() time, we also need to store data at offset 4095 but we
      don't have block allocated for it.
      
      This patch introduces a helper function filesystems can use to have
      ->page_mkwrite() called at all the necessary moments.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      90a80202
  2. 18 Sep, 2014 6 commits
  3. 16 Sep, 2014 4 commits
    • Dmitry Monakhov's avatar
      ext4: explicitly inform user about orphan list cleanup · 84474976
      Dmitry Monakhov authored
      Production fs likely compiled/mounted w/o jbd debugging, so orphan
      list clearing will be silent.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      84474976
    • Dmitry Monakhov's avatar
      jbd2: jbd2_log_wait_for_space improve error detetcion · 1245799f
      Dmitry Monakhov authored
      If EIO happens after we have dropped j_state_lock, we won't notice
      that the journal has been aborted.  So it is reasonable to move this
      check after we have grabbed the j_checkpoint_mutex and re-grabbed the
      j_state_lock.  This patch helps to prevent false positive complain
      after EIO.
      
      #DMESG:
      __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
      __jbd2_log_wait_for_space: no way to get more journal space in ram1-8
      ------------[ cut here ]------------
      WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de57 #139
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
       0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
       ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
      Call Trace:
       [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
       [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
       [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
       [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
       [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
       [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff812025bb>] ext4_create+0x8b/0x150
       [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
       [<ffffffff8118097b>] do_last+0x7db/0xcf0
       [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
       [<ffffffff811845d2>] path_openat+0x242/0x590
       [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
       [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
       [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
       [<ffffffff81172f20>] do_sys_open+0x170/0x220
       [<ffffffff8117300e>] SyS_open+0x1e/0x20
       [<ffffffff811715d6>] SyS_creat+0x16/0x20
       [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
      ---[ end trace cd71c831f82059db ]---
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1245799f
    • Darrick J. Wong's avatar
      jbd2: free bh when descriptor block checksum fails · 064d8389
      Darrick J. Wong authored
      Free the buffer head if the journal descriptor block fails checksum
      verification.
      
      This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
      verify error in do_one_pass".
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: stable@vger.kernel.org
      064d8389
    • Darrick J. Wong's avatar
      ext4: check EA value offset when loading · a0626e75
      Darrick J. Wong authored
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a0626e75
  4. 11 Sep, 2014 6 commits
  5. 05 Sep, 2014 3 commits
  6. 04 Sep, 2014 6 commits
  7. 02 Sep, 2014 6 commits
    • Zheng Liu's avatar
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu authored
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • Zheng Liu's avatar
      ext4: improve extents status tree trace point · e963bb1d
      Zheng Liu authored
      This commit improves the trace point of extents status tree.  We rename
      trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
      in ext4_es_scan() and we can not identify them from the result.
      
      Further this commit fixes a variable name in trace point in order to
      keep consistency with others.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e963bb1d
    • Seunghun Lee's avatar
      ext4: fix comments about get_blocks · d91bd2c1
      Seunghun Lee authored
      get_blocks is renamed to get_block.
      Signed-off-by: default avatarSeunghun Lee <waydi1@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d91bd2c1
    • Darrick J. Wong's avatar
      ext4: enable block_validity by default · 45f1a9c3
      Darrick J. Wong authored
      Enable by default the block_validity feature, which checks for
      collisions between newly allocated blocks and critical system
      metadata.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      45f1a9c3
    • Theodore Ts'o's avatar
      jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() · 88fe1acb
      Theodore Ts'o authored
      __wait_cp_io() is only called by jbd2_log_do_checkpoint().  Fold it in
      to make it a bit easier to understand.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      88fe1acb
    • Theodore Ts'o's avatar
      jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() · be1158cc
      Theodore Ts'o authored
      __process_buffer() is only called by jbd2_log_do_checkpoint(), and it
      had a very complex locking protocol where it would be called with the
      j_list_lock, and sometimes exit with the lock held (if the return code
      was 0), or release the lock.
      
      This was confusing both to humans and to smatch (which erronously
      complained that the lock was taken twice).
      
      Folding __process_buffer() to the caller allows us to simplify the
      control flow, making the resulting function easier to read and reason
      about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
      bytes (over 4% of the text size).
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      be1158cc
  8. 01 Sep, 2014 8 commits