1. 14 Oct, 2014 1 commit
    • Darrick J. Wong's avatar
      ext4: check s_chksum_driver when looking for bg csum presence · 813d32f9
      Darrick J. Wong authored
      Convert the ext4_has_group_desc_csum predicate to look for a checksum
      driver instead of the metadata_csum flag and change the bg checksum
      calculation function to look for GDT_CSUM before taking the crc16
      path.
      
      Without this patch, if we mount with ^uninit_bg,^metadata_csum and
      later metadata_csum gets turned on by accident, the block group
      checksum functions will incorrectly assume that checksumming is
      enabled (metadata_csum) but that crc16 should be used
      (!s_chksum_driver).  This is totally wrong, so fix the predicate
      and the checksum formula selection.
      
      (Granted, if the metadata_csum feature bit gets enabled on a live FS
      then something underhanded is going on, but we could at least avoid
      writing garbage into the on-disk fields.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      813d32f9
  2. 13 Oct, 2014 2 commits
    • Dmitry Monakhov's avatar
      ext4: move error report out of atomic context in ext4_init_block_bitmap() · aef4885a
      Dmitry Monakhov authored
      Error report likely result in IO so it is bad idea to do it from
      atomic context.
      
      This patch should fix following issue:
      
      BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
      in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
      5 locks held by kworker/u128:1/137:
       #0:  ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
       #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
       #2:  (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
       #3:  (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
       #4:  (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
      CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
      Workqueue: writeback bdi_writeback_workfn (flush-1:0)
       0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
       ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
       ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
      Call Trace:
       [<ffffffff815c7fdc>] dump_stack+0x51/0x6d
       [<ffffffff8108fb30>] __might_sleep+0xf0/0x100
       [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
       [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
       [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
       [<ffffffff8120fa03>] save_error_info+0x23/0x30
       [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
       [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
       [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
       [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
       [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
       [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
       [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
       [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
       [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
       [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
       [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
       [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
       [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
       [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
       [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
       [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
       [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
       [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
       [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
       [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
       [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
       [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
       [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
       [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
       [<ffffffff811377e3>] do_writepages+0x23/0x40
       [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
       [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
       [<ffffffff811a0664>] wb_writeback+0x174/0x2d0
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
       [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
       [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
       [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
       [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
       [<ffffffff81085c1d>] worker_thread+0x35d/0x460
       [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
       [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
       [<ffffffff8108a885>] kthread+0xf5/0x100
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff8108a790>] ? __init_kthread_work
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      aef4885a
    • Dmitry Monakhov's avatar
      ext4: Replace open coded mdata csum feature to helper function · 9aa5d32b
      Dmitry Monakhov authored
      Besides the fact that this replacement improves code readability
      it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
      which may result attempt to use uninitialized  csum machinery.
      
      #Testcase_BEGIN
      IMG=/dev/ram0
      MNT=/mnt
      mkfs.ext4 $IMG
      mount $IMG $MNT
      #Enable feature directly on disk, on mounted fs
      tune2fs -O metadata_csum  $IMG
      # Provoke metadata update, likey result in OOPS
      touch $MNT/test
      umount $MNT
      #Testcase_END
      
      # Replacement script
      @@
      expression E;
      @@
      - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
      + ext4_has_metadata_csum(E)
      
      https://bugzilla.kernel.org/show_bug.cgi?id=82201Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      9aa5d32b
  3. 11 Oct, 2014 2 commits
    • Xiaoguang Wang's avatar
      ext4: delete useless comments about ext4_move_extents · 65dd8327
      Xiaoguang Wang authored
      In patch 'ext4: refactor ext4_move_extents code base',  Dmitry Monakhov has
      refactored ext4_move_extents' implementation, but forgot to update the
      corresponding comments, this patch will try to delete some useless comments.
      Reviewed-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      65dd8327
    • Eric Sandeen's avatar
      ext4: fix reservation overflow in ext4_da_write_begin · 0ff8947f
      Eric Sandeen authored
      Delalloc write journal reservations only reserve 1 credit,
      to update the inode if necessary.  However, it may happen
      once in a filesystem's lifetime that a file will cross
      the 2G threshold, and require the LARGE_FILE feature to
      be set in the superblock as well, if it was not set already.
      
      This overruns the transaction reservation, and can be
      demonstrated simply on any ext4 filesystem without the LARGE_FILE
      feature already set:
      
      dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
      	conv=notrunc of=testfile
      sync
      dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
      	conv=notrunc of=testfile
      
      leads to:
      
      EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
      EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
      EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
      EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
      
      Adjust the number of credits based on whether the flag is
      already set, and whether the current write may extend past the
      LARGE_FILE limit.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Cc: stable@vger.kernel.org
      0ff8947f
  4. 06 Oct, 2014 2 commits
    • Theodore Ts'o's avatar
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · f4bb2981
      Theodore Ts'o authored
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f4bb2981
    • Theodore Ts'o's avatar
      ext4: don't orphan or truncate the boot loader inode · e2bfb088
      Theodore Ts'o authored
      The boot loader inode (inode #5) should never be visible in the
      directory hierarchy, but it's possible if the file system is corrupted
      that there will be a directory entry that points at inode #5.  In
      order to avoid accidentally trashing it, when such a directory inode
      is opened, the inode will be marked as a bad inode, so that it's not
      possible to modify (or read) the inode from userspace.
      
      Unfortunately, when we unlink this (invalid/illegal) directory entry,
      we will put the bad inode on the ophan list, and then when try to
      unlink the directory, we don't actually remove the bad inode from the
      orphan list before freeing in-memory inode structure.  This means the
      in-memory orphan list is corrupted, leading to a kernel oops.
      
      In addition, avoid truncating a bad inode in ext4_destroy_inode(),
      since truncating the boot loader inode is not a smart thing to do.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e2bfb088
  5. 03 Oct, 2014 1 commit
    • Dmitry Monakhov's avatar
      ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT · 3e67cfad
      Dmitry Monakhov authored
      Otherwise this provokes complain like follows:
      WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
       0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
       ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
      Call Trace:
       [<ffffffff815c7dfc>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0
       [<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110
       [<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0
       [<ffffffff81088530>] ? alloc_pid+0x480/0x480
       [<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0
       [<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550
       [<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40
       [<ffffffff81186603>] SyS_ioctl+0x53/0x80
       [<ffffffff815ce2ce>] tracesys+0xd0/0xd5
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3e67cfad
  6. 02 Oct, 2014 6 commits
  7. 18 Sep, 2014 6 commits
  8. 16 Sep, 2014 4 commits
    • Dmitry Monakhov's avatar
      ext4: explicitly inform user about orphan list cleanup · 84474976
      Dmitry Monakhov authored
      Production fs likely compiled/mounted w/o jbd debugging, so orphan
      list clearing will be silent.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      84474976
    • Dmitry Monakhov's avatar
      jbd2: jbd2_log_wait_for_space improve error detetcion · 1245799f
      Dmitry Monakhov authored
      If EIO happens after we have dropped j_state_lock, we won't notice
      that the journal has been aborted.  So it is reasonable to move this
      check after we have grabbed the j_checkpoint_mutex and re-grabbed the
      j_state_lock.  This patch helps to prevent false positive complain
      after EIO.
      
      #DMESG:
      __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
      __jbd2_log_wait_for_space: no way to get more journal space in ram1-8
      ------------[ cut here ]------------
      WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de57 #139
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
       0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
       ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
      Call Trace:
       [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
       [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
       [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
       [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
       [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
       [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff812025bb>] ext4_create+0x8b/0x150
       [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
       [<ffffffff8118097b>] do_last+0x7db/0xcf0
       [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
       [<ffffffff811845d2>] path_openat+0x242/0x590
       [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
       [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
       [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
       [<ffffffff81172f20>] do_sys_open+0x170/0x220
       [<ffffffff8117300e>] SyS_open+0x1e/0x20
       [<ffffffff811715d6>] SyS_creat+0x16/0x20
       [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
      ---[ end trace cd71c831f82059db ]---
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1245799f
    • Darrick J. Wong's avatar
      jbd2: free bh when descriptor block checksum fails · 064d8389
      Darrick J. Wong authored
      Free the buffer head if the journal descriptor block fails checksum
      verification.
      
      This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
      verify error in do_one_pass".
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: stable@vger.kernel.org
      064d8389
    • Darrick J. Wong's avatar
      ext4: check EA value offset when loading · a0626e75
      Darrick J. Wong authored
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a0626e75
  9. 11 Sep, 2014 6 commits
  10. 05 Sep, 2014 3 commits
  11. 04 Sep, 2014 6 commits
  12. 02 Sep, 2014 1 commit
    • Zheng Liu's avatar
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu authored
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      eb68d0e2