1. 11 Jul, 2011 8 commits
    • Tao Ma's avatar
      ext4: Change the wrong param comment for ext4_trim_all_free · 22612283
      Tao Ma authored
      at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
      instead it is @group.
      Reported-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      22612283
    • Tao Ma's avatar
      ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2
      Tao Ma authored
      In ext4, when FITRIM is called every time, we iterate all the
      groups and do trim one by one. It is a bit time wasting if the
      group has been trimmed and there is no change since the last
      trim.
      
      So this patch adds a new flag in ext4_group_info->bb_state to
      indicate that the group has been trimmed, and it will be cleared
      if some blocks is freed(in release_blocks_on_commit). Another
      trim_minlen is added in ext4_sb_info to record the last minlen
      we use to trim the volume, so that if the caller provide a small
      one, we will go on the trim regardless of the bb_state.
      
      A simple test with my intel x25m ssd:
      df -h shows:
      /dev/sdb1              40G   21G   17G  56% /mnt/ext4
      Block size:               4096
      
      run the FITRIM with the following parameter:
      range.start = 0;
      range.len = UINT64_MAX;
      range.minlen = 1048576;
      
      without the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.505s
      user	0m0.000s
      sys	0m1.224s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.359s
      user	0m0.000s
      sys	0m1.178s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.228s
      user	0m0.000s
      sys	0m1.151s
      
      with the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.625s
      user	0m0.000s
      sys	0m1.269s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      
      A big improvement for the 2nd and 3rd run.
      
      Even after I delete some big image files, it is still much
      faster than iterating the whole disk.
      
      [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
      real	0m1.217s
      user	0m0.000s
      sys	0m0.196s
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarAndreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      3d56b8d2
    • Tao Ma's avatar
      ext4: Add new ext4 trim tracepoints · b3d4c2b1
      Tao Ma authored
      Add ext4_trim_extent and ext4_trim_all_free.
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b3d4c2b1
    • Tao Ma's avatar
      ext4: speed up group trim with the right free block count · 169ddc3e
      Tao Ma authored
      When we trim some free blocks in a group of ext4, we need to 
      calculate the free blocks properly and check whether there are
      enough freed blocks left for us to trim. Current solution will
      only calculate free spaces if they are large for a trim which
      isn't appropriate.
      
      Let us see a small example:
      a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
      And minblocks is 1M.  With current solution, we have to iterate
      the whole group since these 300k will never be subtracted from
      1.5M.  But actually we should exit after we find the first 2
      free spaces since the left 3 chunks only sum up to 900K if we
      subtract the first 600K although they can't be trimed.
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      169ddc3e
    • Tao Ma's avatar
      ext4: fix trim length underflow with small trim length · 22f10457
      Tao Ma authored
      In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
      it could underflow in case blocksize=1K, fstrim_range.len=512 and
      fstrim_range.start = 0. In this case, when we run the code:
      len -= first_data_blk - start; len will be underflow to -1ULL.
      In the end, although we are safe that last_group check later will limit
      the trim to the whole volume, but that isn't what the user really want.
      
      So this patch fix it. It also adds the check for 'start' like ext3 so that
      we can break immediately if the start is invalid.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      22f10457
    • Theodore Ts'o's avatar
      ext4: add tracepoint for ext4_journal_start · 12706394
      Theodore Ts'o authored
      This will help debug who is responsible for starting a jbd2 transaction.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      12706394
    • Theodore Ts'o's avatar
      jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints · 4862fd60
      Theodore Ts'o authored
      Using function calls in TP_printk causes perf heartburn, so print the
      MAJOR/MINOR device numbers instead.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      4862fd60
    • Jiaying Zhang's avatar
      ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b
      Jiaying Zhang authored
      Upon corrupted inode or disk failures, we may fail after we already
      allocate some blocks from the inode or take some blocks from the
      inode's preallocation list, but before we successfully insert the
      corresponding extent to the extent tree. In this case, we should free
      any allocated blocks and discard the inode's preallocated blocks
      because the entries in the inode's preallocation list may be in an
      inconsistent state.
      Signed-off-by: default avatarJiaying Zhang <jiayingz@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      575a1d4b
  2. 10 Jul, 2011 1 commit
    • Maxim Patlasov's avatar
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov authored
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: default avatarMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  3. 30 Jun, 2011 1 commit
  4. 28 Jun, 2011 2 commits
  5. 27 Jun, 2011 7 commits
    • Amir Goldstein's avatar
      ext4: move ext4_ind_* functions from inode.c to indirect.c · dae1e52c
      Amir Goldstein authored
      This patch moves functions from inode.c to indirect.c.
      The moved functions are ext4_ind_* functions and their helpers.
      Functions called from inode.c are declared extern.
      Signed-off-by: default avatarAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      dae1e52c
    • Theodore Ts'o's avatar
      ext4: move common truncate functions to header file · 9f125d64
      Theodore Ts'o authored
      Move two functions that will be needed by the indirect functions to be
      moved to indirect.c as well as inode.c to truncate.h as inline
      functions, so that we can avoid having duplicate copies of the
      function (which can be a maintenance problem) without having to expose
      them as globally functions.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9f125d64
    • Theodore Ts'o's avatar
      ext4: move __ext4_check_blockref to block_validity.c · 1f7d1e77
      Theodore Ts'o authored
      In preparation for moving the indirect functions to a separate file,
      move __ext4_check_blockref() to block_validity.c and rename it to
      ext4_check_blockref() which is exported as globally visible function.
      
      Also, rename the cpp macro ext4_check_inode_blockref() to
      ext4_ind_check_inode(), to make it clear that it is only valid for use
      with non-extent mapped inodes.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1f7d1e77
    • Amir Goldstein's avatar
      ext4: rename ext4_indirect_* funcs to ext4_ind_* · 8bb2b247
      Amir Goldstein authored
      We are going to move all ext4_ind_* functions to indirect.c.
      Before we do that, let's rename 2 functions called ext4_indirect_*
      to ext4_ind_*, to keep to the naming convention.
      Signed-off-by: default avatarAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8bb2b247
    • Amir Goldstein's avatar
      ext4: split ext4_ind_truncate from ext4_truncate · ff9893dc
      Amir Goldstein authored
      We are about to move all indirect inode functions to a new file.
      Before we do that, let's split ext4_ind_truncate() out of ext4_truncate()
      leaving only generic code in the latter, so we will be able to move
      ext4_ind_truncate() to the new file.
      Signed-off-by: default avatarAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ff9893dc
    • Robin Dong's avatar
      ext4: fix incorrect error msg in ext4_ext_insert_index · ed7a7e16
      Robin Dong authored
      In function ext4_ext_insert_index when eh_entries of curp is
      bigger than eh_max, error messages will be printed out, but the content
      is about logical and ei_block, that's incorret.
      Signed-off-by: default avatarRobin Dong <sanbai@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ed7a7e16
    • Tao Ma's avatar
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma authored
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: default avatarRobin Dong <sanbai@taobao.com>
      d3ad8434
  6. 21 Jun, 2011 8 commits
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 890879cf
      Linus Torvalds authored
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        jbd2: Fix oops in jbd2_journal_remove_journal_head()
        jbd2: Remove obsolete parameters in the comments for some jbd2 functions
        ext4: fixed tracepoints cleanup
        ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
        ext4: Fix max file size and logical block counting of extent format file
        ext4: correct comments for ext4_free_blocks()
      890879cf
    • Linus Torvalds's avatar
      Linux 3.0-rc4 · 56299378
      Linus Torvalds authored
      56299378
    • Linus Torvalds's avatar
      vfs: i_state needs to be 'unsigned long' for now · 79568f5b
      Linus Torvalds authored
      Commit 13e12d14 ("vfs: reorganize 'struct inode' layout a bit")
      moved things around a bit changed i_state to be unsigned int instead of
      unsigned long.  That was to help structure layout for the 64-bit case,
      and shrink 'struct inode' a bit (admittedly that only happened when
      spinlock debugging was on and i_flags didn't pack with i_lock).
      
      However, Meelis Roos reports that this results in unaligned exceptions
      on sprc, and it turns out that the bit-locking primitives that we use
      for the I_NEW bit want to use the bitops.  Which want 'unsigned long',
      not 'unsigned int'.
      
      We really should fix the bit locking code to not have that kind of
      requirement, but that's a much bigger change.  So for now, revert that
      field back to 'unsigned long' (but keep the other re-ordering changes
      from the commit that caused this).
      
      Andi points out that we have played games with this in 'struct page', so
      it's solvable with other hacks too, but since right now the struct inode
      size advantage only happens with some rare config options, it's not
      worth fighting.
      
      It _would_ be worth fixing the bitlocking code, though.  Especially
      since there is no type safety in the bitlocking code (this never caused
      any warnings, and worked fine on x86-64, because the bitlocks take a
      'void *' and x86-64 doesn't care that deeply about alignment).  So it's
      currently a very easy problem to trigger by mistake and never notice.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79568f5b
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · f5fc5567
      Linus Torvalds authored
      * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
        drm/radeon/kms/r6xx+: voltage fixes
        drm/nouveau: drop leftover debugging
        drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
        drm/radeon/kms: add missing param for dce3.2 DP transmitter setup
        drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
        drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
        drm/nv50/disp: fix gamma with page flipping overlay turned on
        drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
        drm/nouveau: fix big-endian switch
      f5fc5567
    • Linus Torvalds's avatar
      Merge branch 'msm-fix' of git://codeaurora.org/quic/kernel/davidb/linux-msm · 85d45ade
      Linus Torvalds authored
      * 'msm-fix' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
        msm: timer: Fix DGT rate on 8960 and 8660
        msm: timer: compensate for timer shift in msm_read_timer_count
        msm: timer: Fix SMP build error
      85d45ade
    • Linus Torvalds's avatar
      Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux · eda08410
      Linus Torvalds authored
      * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
        nfsd4: fix break_lease flags on nfsd open
        nfsd: link returns nfserr_delay when breaking lease
        nfsd: v4 support requires CRYPTO
        nfsd: fix dependency of nfsd on auth_rpcgss
      eda08410
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 6e158d21
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
        pxa168_eth: fix race in transmit path.
        ipv4, ping: Remove duplicate icmp.h include
        netxen: fix race in skb->len access
        sgi-xp: fix a use after free
        hp100: fix an skb->len race
        netpoll: copy dev name of slaves to struct netpoll
        ipv4: fix multicast losses
        r8169: fix static initializers.
        inet_diag: fix inet_diag_bc_audit()
        gigaset: call module_put before restart of if_open()
        farsync: add module_put to error path in fst_open()
        net: rfs: enable RFS before first data packet is received
        fs_enet: fix freescale FCC ethernet dp buffer alignment
        netdev: bfin_mac: fix memory leak when freeing dma descriptors
        vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
        caif: Bugfix - XOFF removed channel from caif-mux
        tun: teach the tun/tap driver to support netpoll
        dp83640: drop PHY status frames in the driver.
        dp83640: fix phy status frame event parsing
        phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
        ...
      6e158d21
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 36698206
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
        devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
        fix comment in generic_permission()
        kill obsolete comment for follow_down()
        proc_sys_permission() is OK in RCU mode
        reiserfs_permission() doesn't need to bail out in RCU mode
        proc_fd_permission() is doesn't need to bail out in RCU mode
        nilfs2_permission() doesn't need to bail out in RCU mode
        logfs doesn't need ->permission() at all
        coda_ioctl_permission() is safe in RCU mode
        cifs_permission() doesn't need to bail out in RCU mode
        bad_inode_permission() is safe from RCU mode
        ubifs: dereferencing an ERR_PTR in ubifs_mount()
      36698206
  7. 20 Jun, 2011 13 commits