1. 04 Jun, 2013 9 commits
    • Jan Kara's avatar
      jbd2: transaction reservation support · 8f7d89f3
      Jan Kara authored
      In some cases we cannot start a transaction because of locking
      constraints and passing started transaction into those places is not
      handy either because we could block transaction commit for too long.
      Transaction reservation is designed to solve these issues.  It
      reserves a handle with given number of credits in the journal and the
      handle can be later attached to the running transaction without
      blocking on commit or checkpointing.  Reserved handles do not block
      transaction commit in any way, they only reduce maximum size of the
      running transaction (because we have to always be prepared to
      accomodate request for attaching reserved handle).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8f7d89f3
    • Jan Kara's avatar
      jbd2: remove unused waitqueues · f29fad72
      Jan Kara authored
      j_wait_logspace and j_wait_checkpoint are unused.  Remove them.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f29fad72
    • Jan Kara's avatar
      jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend() · fe1e8db5
      Jan Kara authored
      jbd2_journal_extend() first checked whether transaction can accept
      extending handle with more credits and then added credits to
      t_outstanding_credits.  This can race with start_this_handle() adding
      another handle to a transaction and thus overbooking a transaction.
      Make jbd2_journal_extend() use atomic_add_return() to close the race.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      fe1e8db5
    • Jan Kara's avatar
      jbd2: cleanup needed free block estimates when starting a transaction · 76c39904
      Jan Kara authored
      __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
      jbd_space_needed() accounted also credits needed for currently
      committing transaction while it didn't account for credits needed for
      control blocks.  __jbd2_log_space_left() then accounted for control
      blocks as a fraction of free space.  Since results of these two
      functions are always only compared against each other, this works
      correct but is somewhat strange.  Move the estimates so that
      jbd_space_needed() returns number of blocks needed for a transaction
      including control blocks and __jbd2_log_space_left() returns free
      space in the journal (with the committing transaction already
      subtracted).  Rename functions to jbd2_log_space_left() and
      jbd2_space_needed() while we are changing them.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      76c39904
    • Jan Kara's avatar
      jbd2: remove outdated comment · 2f387f84
      Jan Kara authored
      The comment about credit estimates isn't true anymore. We do what the
      comment describes now.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2f387f84
    • Jan Kara's avatar
      jbd2: refine waiting for shadow buffers · b34090e5
      Jan Kara authored
      Currently when we add a buffer to a transaction, we wait until the
      buffer is removed from BJ_Shadow list (so that we prevent any changes
      to the buffer that is just written to the journal).  This can take
      unnecessarily long as a lot happens between the time the buffer is
      submitted to the journal and the time when we remove the buffer from
      BJ_Shadow list.  (e.g.  We wait for all data buffers in the
      transaction, we issue a cache flush, etc.)  Also this creates a
      dependency of do_get_write_access() on transaction commit (namely
      waiting for data IO to complete) which we want to avoid when
      implementing transaction reservation.
      
      So we modify commit code to set new BH_Shadow flag when temporary
      shadowing buffer is created and we clear that flag once IO on that
      buffer is complete.  This allows do_get_write_access() to wait only
      for BH_Shadow bit and thus removes the dependency on data IO
      completion.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b34090e5
    • Jan Kara's avatar
      jbd2: remove journal_head from descriptor buffers · e5a120ae
      Jan Kara authored
      Similarly as for metadata buffers, also log descriptor buffers don't
      really need the journal head. So strip it and remove BJ_LogCtl list.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e5a120ae
    • Jan Kara's avatar
      jbd2: don't create journal_head for temporary journal buffers · f5113eff
      Jan Kara authored
      When writing metadata to the journal, we create temporary buffer heads
      for that task.  We also attach journal heads to these buffer heads but
      the only purpose of the journal heads is to keep buffers linked in
      transaction's BJ_IO list.  We remove the need for journal heads by
      reusing buffer_head's b_assoc_buffers list for that purpose.  Also
      since BJ_IO list is just a temporary list for transaction commit, we
      use a private list in jbd2_journal_commit_transaction() for that thus
      removing BJ_IO list from transaction completely.
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f5113eff
    • Jan Kara's avatar
      ext4: use io_end for multiple bios · 97a851ed
      Jan Kara authored
      Change writeback path to create just one io_end structure for the
      extent to which we submit IO and share it among bios writing that
      extent. This prevents needless splitting and joining of unwritten
      extents when they cannot be submitted as a single bio.
      
      Bugs in ENOMEM handling found by Linux File System Verification project
      (linuxtesting.org) and fixed by Alexey Khoroshilov
      <khoroshilov@ispras.ru>.
      
      CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      97a851ed
  2. 31 May, 2013 4 commits
  3. 28 May, 2013 13 commits
    • Paul Taysom's avatar
      ext4: suppress ext4 orphan messages on mount · 566370a2
      Paul Taysom authored
      Suppress the messages releating to processing the ext4 orphan list
      ("truncating inode" and "deleting unreferenced inode") unless the
      debug option is on, since otherwise they end up taking up space in the
      log that could be used for more useful information.
      
      Tested by opening several files, unlinking them, then
      crashing the system, rebooting the system and examining
      /var/log/messages.
      
      Addresses the problem described in http://crbug.com/220976Signed-off-by: default avatarPaul Taysom <taysom@chromium.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      566370a2
    • Darrick J. Wong's avatar
      jbd2: fix block tag checksum verification brokenness · eee06c56
      Darrick J. Wong authored
      Al Viro complained of a ton of bogosity with regards to the jbd2 block
      tag header checksum.  This one checksum is 16 bits, so cut off the
      upper 16 bits and treat it as a 16-bit value and don't mess around
      with be32* conversions.  Fortunately metadata checksumming is still
      "experimental" and not in a shipping e2fsprogs, so there should be few
      users affected by this.
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      eee06c56
    • Zheng Liu's avatar
      jbd2: use kmem_cache_zalloc for allocating journal head · 5d9cf9c6
      Zheng Liu authored
      This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/
      memset when a new journal head is alloctated.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      5d9cf9c6
    • Lukas Czerner's avatar
      ext4: make punch hole code path work with bigalloc · d23142c6
      Lukas Czerner authored
      Currently punch hole is disabled in file systems with bigalloc
      feature enabled. However the recent changes in punch hole patch should
      make it easier to support punching holes on bigalloc enabled file
      systems.
      
      This commit changes partial_cluster handling in ext4_remove_blocks(),
      ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
      partial_cluster is unsigned long long type and it makes sure that we
      will free the partial cluster if all extents has been released from that
      cluster. However it has been specifically designed only for truncate.
      
      With punch hole we can be freeing just some extents in the cluster
      leaving the rest untouched. So we have to make sure that we will notice
      cluster which still has some extents. To do this I've changed
      partial_cluster to be signed long long type. The only scenario where
      this could be a problem is when cluster_size == block size, however in
      that case there would not be any partial clusters so we're safe. For
      bigger clusters the signed type is enough. Now we use the negative value
      in partial_cluster to mark such cluster used, hence we know that we must
      not free it even if all other extents has been freed from such cluster.
      
      This scenario can be described in simple diagram:
      
      |FFF...FF..FF.UUU|
       ^----------^
        punch hole
      
      . - free space
      | - cluster boundary
      F - freed extent
      U - used extent
      
      Also update respective tracepoints to use signed long long type for
      partial_cluster.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d23142c6
    • Lukas Czerner's avatar
      ext4: update ext4_ext_remove_space trace point · 61801325
      Lukas Czerner authored
      Add "end" variable.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      61801325
    • Lukas Czerner's avatar
      ext4: remove unused code from ext4_remove_blocks() · 78fb9cdf
      Lukas Czerner authored
      The "head removal" branch in the condition is never used in any code
      path in ext4 since the function only caller ext4_ext_rm_leaf() will make
      sure that the extent is properly split before removing blocks. Note that
      there is a bug in this branch anyway.
      
      This commit removes the unused code completely and makes use of
      ext4_error() instead of printk if dubious range is provided.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      78fb9cdf
    • Lukas Czerner's avatar
      ext4: remove unused discard_partial_page_buffers · c121ffd0
      Lukas Czerner authored
      The discard_partial_page_buffers is no longer used anywhere so we can
      simply remove it including the *_no_lock variant and
      EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c121ffd0
    • Lukas Czerner's avatar
      ext4: use ext4_zero_partial_blocks in punch_hole · a87dd18c
      Lukas Czerner authored
      We're doing to get rid of ext4_discard_partial_page_buffers() since it is
      duplicating some code and also partially duplicating work of
      truncate_pagecache_range(), moreover the old implementation was much
      clearer.
      
      Now when the truncate_inode_pages_range() can handle truncating non page
      aligned regions we can use this to invalidate and zero out block aligned
      region of the punched out range and then use ext4_block_truncate_page()
      to zero the unaligned blocks on the start and end of the range. This
      will greatly simplify the punch hole code. Moreover after this commit we
      can get rid of the ext4_discard_partial_page_buffers() completely.
      
      We also introduce function ext4_prepare_punch_hole() to do come common
      operations before we attempt to do the actual punch hole on
      indirect or extent file which saves us some code duplication.
      
      This has been tested on ppc64 with 1k block size with fsx and xfstests
      without any problems.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a87dd18c
    • Lukas Czerner's avatar
      ext4: truncate_inode_pages() in orphan cleanup path · 55f252c9
      Lukas Czerner authored
      Currently we do not tell mm to zero out tail of the page before truncate
      in orphan_cleanup(). This is ok, because the page should not be
      uptodate, however this may eventually change and I might cause problems.
      
      Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
      for pointing this out.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      55f252c9
    • Lukas Czerner's avatar
      Revert "ext4: fix fsx truncate failure" · eb3544c6
      Lukas Czerner authored
      This reverts commit 189e868f.
      
      This commit reintroduces the use of ext4_block_truncate_page() in ext4
      truncate operation instead of ext4_discard_partial_page_buffers().
      
      The statement in the commit description that the truncate operation only
      zero block unaligned portion of the last page is not exactly right,
      since truncate_pagecache_range() also zeroes and invalidate the unaligned
      portion of the page. Then there is no need to zero and unmap it once more
      and ext4_block_truncate_page() was doing the right job, although we
      still need to update the buffer head containing the last block, which is
      exactly what ext4_block_truncate_page() is doing.
      
      Moreover the problem described in the commit is fixed more properly with
      commit
      
      15291164
      	jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
      
      This was tested on ppc64 machine with block size of 1024 bytes without
      any problems.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      eb3544c6
    • Lukas Czerner's avatar
      ext4: Call ext4_jbd2_file_inode() after zeroing block · 0713ed0c
      Lukas Czerner authored
      In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
      after the truncate transaction has committed does not expose stall data
      in the tail of the block.
      
      Thanks Jan Kara for pointing that out.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0713ed0c
    • Lukas Czerner's avatar
      Revert "ext4: remove no longer used functions in inode.c" · d863dc36
      Lukas Czerner authored
      This reverts commit ccb4d7af.
      
      This commit reintroduces functions ext4_block_truncate_page() and
      ext4_block_zero_page_range() which has been previously removed in favour
      of ext4_discard_partial_page_buffers().
      
      In future commits we want to reintroduce those function and remove
      ext4_discard_partial_page_buffers() since it is duplicating some code
      and also partially duplicating work of truncate_pagecache_range(),
      moreover the old implementation was much clearer.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d863dc36
    • Lukas Czerner's avatar
      mm: teach truncate_inode_pages_range() to handle non page aligned ranges · 5a720394
      Lukas Czerner authored
      This commit changes truncate_inode_pages_range() so it can handle non
      page aligned regions of the truncate. Currently we can hit BUG_ON when
      the end of the range is not page aligned, but we can handle unaligned
      start of the range.
      
      Being able to handle non page aligned regions of the page can help file
      system punch_hole implementations and save some work, because once we're
      holding the page we might as well deal with it right away.
      
      In previous commits we've changed ->invalidatepage() prototype to accept
      'length' argument to be able to specify range to invalidate. No we can
      use that new ability in truncate_inode_pages_range().
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5a720394
  4. 22 May, 2013 9 commits
  5. 20 May, 2013 5 commits