1. 09 Feb, 2013 1 commit
    • Theodore Ts'o's avatar
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o authored
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  2. 08 Feb, 2013 2 commits
  3. 07 Feb, 2013 2 commits
  4. 04 Feb, 2013 1 commit
    • Theodore Ts'o's avatar
      ext4: optimize mballoc for large allocations · 40ae3487
      Theodore Ts'o authored
      The ext4 block allocator only maintains buddy bitmaps for chunks which
      are less than or equal to one quarter of a block group.  That is, for
      a file aystem with a 1k blocksize, and where the number of blocks in a
      block group is 8192 blocks, the largest chunk size tracked by buddy
      bitmaps is 2048 blocks.
      
      For a file system with a 4k blocksize, and where the number of blocks
      in a block group is 32768 blocks, the largest chunk size tracked by
      buddy bitmaps is 8192 blocks.
      
      To work around this code, mballoc.c before this commit would truncate
      allocation requests to the number of blocks in a block group minus 10.
      Why 10?  Aside from being a completely arbitrary number, it avoids
      block allocation to be a power of two larger than 25% of the block
      group.  If you try to explicitly fallocate 50% of the block group
      size, this will demonstrate the problem; the block allocation code
      will scan the all of the blocks in the file system with cr==0 (since
      the request is for a natural power of two), but then completely fail
      for all blocks groups, since the buddy bitmaps don't track chunk sizes
      of 50% of the block group.
      
      To fix this, in these we use ext4_mb_complex_scan_group() instead of
      ext4_mb_simple_scan_group().
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger@dilger.ca>
      40ae3487
  5. 03 Feb, 2013 4 commits
  6. 02 Feb, 2013 4 commits
  7. 30 Jan, 2013 2 commits
  8. 29 Jan, 2013 8 commits
  9. 28 Jan, 2013 8 commits
  10. 25 Jan, 2013 3 commits
  11. 17 Jan, 2013 1 commit
  12. 13 Jan, 2013 1 commit
  13. 12 Jan, 2013 3 commits
    • Eryu Guan's avatar
      ext4: check bh in ext4_read_block_bitmap() · 15b49132
      Eryu Guan authored
      Validate the bh pointer before using it, since
      ext4_read_block_bitmap_nowait() might return NULL.
      
      I've seen this in fsfuzz testing.
      
       EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0
       ...
       Call Trace:
        [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60
        [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80
        [<ffffffff811d0d36>] ? __getblk+0x36/0x70
        [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210
        [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140
        [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0
        [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100
        [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0
        [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230
        [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490
        [<ffffffff811b8cd7>] evict+0xa7/0x1c0
        [<ffffffff811b8ed3>] iput_final+0xe3/0x170
        [<ffffffff811b8f9e>] iput+0x3e/0x50
        [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90
        [<ffffffff81231d0b>] ext4_create+0xeb/0x170
        [<ffffffff811aae9c>] vfs_create+0xac/0xd0
        [<ffffffff811ac845>] lookup_open+0x185/0x1c0
        [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170
        [<ffffffff811acb54>] do_last+0x2d4/0x7a0
        [<ffffffff811af743>] path_openat+0xb3/0x480
        [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0
        [<ffffffff811afc49>] do_filp_open+0x49/0xa0
        [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150
        [<ffffffff8119da28>] do_sys_open+0x108/0x1f0
        [<ffffffff8119db51>] sys_open+0x21/0x30
        [<ffffffff81618959>] system_call_fastpath+0x16/0x1b
      
      Also fix comment for ext4_read_block_bitmap_nowait()
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      15b49132
    • Wang Shilong's avatar
      ext4: use unlikely to improve the efficiency of the kernel · aebf0243
      Wang Shilong authored
      Because the function 'sb_getblk' seldomly fails to return NULL
      value,it will be better to use 'unlikely' to optimize it.
      Signed-off-by: default avatarWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      aebf0243
    • Theodore Ts'o's avatar
      ext4: return ENOMEM if sb_getblk() fails · 860d21e2
      Theodore Ts'o authored
      The only reason for sb_getblk() failing is if it can't allocate the
      buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
      make sure that the file system is marked as being inconsistent if
      sb_getblk() fails.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      860d21e2