1. 18 Jun, 2003 40 commits
    • Andrew Morton's avatar
      [PATCH] JBD: journal_try_to_free_buffers race fix · b55d3305
      Andrew Morton authored
      There is a race between transaction commit's attempt to free journal_heads
      and journal_try_to_free_buffers' attempt.
      
      Fix that by taking a ref against the journal_head in
      journal_try_to_free_buffers().
      b55d3305
    • Andrew Morton's avatar
      [PATCH] ext3: fix data=journal mode · de285c52
      Andrew Morton authored
      ext3's fully data-journalled mode has been broken for a year.  This patch
      fixes it up.
      
      The prepare_write/commit_write/writepage implementations have been split up.
      Instead of having each function handle all three journalling mode we now have
      three separate sets of address_space_operations.
      
      The problematic part of data=journal is MAP_SHARED writepage traffic: pages
      which don't have buffers.  In 2.4 these were cheatingly treated as
      data-ordered buffers and that caused several nasty problems.
      
      Here we do it properly: writepage traffic is fully journalled.  This means
      that the various workarounds for the 2.4 scheme can be removed, when I
      remember where they all are.
      
      The PG_checked flag has been borrowed: it it set in the atomic set_page_dirty
      a_op to tell the subsequent writepage() that this page needs to have buffers
      attached, dirtied and journalled.
      
      This rather defines PG_checked as "fs-private info in page->flags" and it
      should be renamed sometime.
      de285c52
    • Andrew Morton's avatar
      [PATCH] JBD: do_get_write_access() speedup · 8b7eec3b
      Andrew Morton authored
      Avoid holding the journal's j_list_lock while copying the buffer_head's data.
      We hold jbd_lock_bh_state() during the copy, which is all that is needed.
      8b7eec3b
    • Andrew Morton's avatar
      [PATCH] JBD: fix log_start_commit race · fba1fdee
      Andrew Morton authored
      In start_this_handle() the caller does not have a handle ref pinning the
      transaction open, and so the call to log_start_commit() is racy because some
      other CPU could take the transaction into commit state independently.
      
      Fix that by holding j_state_lock (which pins j_running_transaction) across
      the log_start_commit() call.
      fba1fdee
    • Andrew Morton's avatar
      [PATCH] JBD: additional transaction shutdown locking · 28a4dd1b
      Andrew Morton authored
      Plug a conceivable race with the freeing up of trasnactions, and add some
      more debug checks.
      28a4dd1b
    • Andrew Morton's avatar
      [PATCH] JBD: add some locking assertions · 833f3d15
      Andrew Morton authored
      Drop in a few assertions to ensure that the locking rules are being adhered
      to.
      833f3d15
    • Andrew Morton's avatar
      [PATCH] JBD: buffer freeing non-race comment · eba4b4b7
      Andrew Morton authored
      Add a comment describing why a race isn't there.
      eba4b4b7
    • Andrew Morton's avatar
      [PATCH] ext3: ext3_writepage race fix · dd71e33f
      Andrew Morton authored
      After ext3_writepage() has called block_write_full_page() it will walk the
      page's buffer ring dropping the buffer_head refcounts.
      
      It does this wrong - on the final loop it will dereference the buffer_head
      which it just dropped the refcount on.  Poisoned oopses have been seen
      against bh->b_this_page.
      
      Change it to take a local copy of b_this_page prior to dropping the bh's
      refcount.
      dd71e33f
    • Andrew Morton's avatar
      [PATCH] JBD: journal_unmap_buffer race fix · e3380360
      Andrew Morton authored
      We need to check that buffer is still journalled _after_ taking the right
      locks.
      e3380360
    • Andrew Morton's avatar
      [PATCH] JBD: journal_release_buffer: handle credits fix · 4b3044b0
      Andrew Morton authored
      There's a bug: a caller tries to journal a buffer and then decides he didn't
      want to after all.  He calls journal_release_buffer().
      
      But journal_release_buffer() is only allowed to give the caller a buffer
      credit back if it was the caller who added the buffer in the first place.
      
      journal_release_buffer() currently looks at the buffer state to work that
      out, but gets it wrong: if the buffer has been moved onto a different list by
      some other part of ext3 the credit is bogusly not returned to the caller and
      the fs can later go BUG due to handle credit exhaustion.
      
      
      The fix:
      
      Change journal_get_undo_access() to return the number of buffers which the
      caller actually added to the journal.  (one or zero).
      
      When the caller later calls journal_release_buffer(), he passes in that
      count, to tell journal_release_buffer() how many credits the caller should
      get back.
      
      For API consistency this change should also be made to
      journal_get_create_access() and journal_get_write_access().  But there is no
      requirement for that in ext3 at this time.
      
      
      The remaining bug:
      
      This logic effectively gives another transaction handle a free buffer credit.
      These could conceivably accumulate and cause a journal overflow.  This is a
      separate problem and needs changes to the t_outstanding_credits accounting
      and the logic in start_this_handle.
      4b3044b0
    • Andrew Morton's avatar
      [PATCH] JBD: remove lock_journal() · 9fe6d81a
      Andrew Morton authored
      This filesystem-wide sleeping lock is no longer needed.  Remove it.
      9fe6d81a
    • Andrew Morton's avatar
      [PATCH] JBD: remove lock_kernel() · f16f1182
      Andrew Morton authored
      lock_kernel() is no longer needed in JBD.  Remove all the lock_kernel() calls
      from fs/jbd/.
      
      Here is where I get to say "ex-parrot".
      f16f1182
    • Andrew Morton's avatar
      [PATCH] JBD: remove remaining sleep_on()s · b9c3dc07
      Andrew Morton authored
      Remove the remaining sleep_on() calls from JBD.
      b9c3dc07
    • Andrew Morton's avatar
      [PATCH] JBD: implement dual revoke tables. · ba8edd6d
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      We're about to remove lock_journal(), and it is lock_journal which separates
      the running and committing transaction's revokes on the single revoke table.
      
      So implement two revoke tables and rotate them at commit time.
      ba8edd6d
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_commit_request locking · ca340395
      Andrew Morton authored
      Impement the designed locking around journal->j_commit_request.
      ca340395
    • Andrew Morton's avatar
      [PATCH] JBD: implement journal->j_commit_sequence locking · 6b65bc1f
      Andrew Morton authored
      Implement the designed locking around journal->j_commit_sequence.
      6b65bc1f
    • Andrew Morton's avatar
      [PATCH] JBD: implement journal->j_free locking · e3a03fb8
      Andrew Morton authored
      Implement the designed locking around journal->j_free.
      
      Things get a lot better here, too.
      e3a03fb8
    • Andrew Morton's avatar
      [PATCH] JBD: implement journal->j_tail locking · 2e89f6eb
      Andrew Morton authored
      Implement the designed locking around journal->j_tail.
      2e89f6eb
    • Andrew Morton's avatar
      [PATCH] JBD: implement journal->j_head locking · 23ce7898
      Andrew Morton authored
      Implement the designed locking around journal->j_head.
      23ce7898
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_checkpoint_transactions locking · 2d16ce3a
      Andrew Morton authored
      Implement the designed locking around j_checkpoint_transactions.  It was all
      pretty much there actually.
      2d16ce3a
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_committing_transaction locking · 36c3ce5d
      Andrew Morton authored
      Go through all sites which use j_committing_transaction and ensure that the
      deisgned locking is correctly implemented there.
      36c3ce5d
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_running_transaction locking · e63ebf6b
      Andrew Morton authored
      Implement the designed locking around journal->j_running_transaction.
      
      A lot more of the new locking scheme falls into place.
      e63ebf6b
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_barrier_count locking · 152dede7
      Andrew Morton authored
      We now start to move onto the fields of the topmost JBD data structure: the
      journal.
      
      The patch implements the designed locking around the j_barrier_count member.
      And as a part of that, a lot of the new locking scheme is implemented.
      Several lock_kernel()s and sleep_on()s go away.
      152dede7
    • Andrew Morton's avatar
      [PATCH] JBD: implement t_jcb locking · 516e0cf7
      Andrew Morton authored
      Provide the designed locking around the transaction's t_jcb callback list.
      
      It turns out that this is wholly redundant at present.
      516e0cf7
    • Andrew Morton's avatar
      [PATCH] JBD: implement t_outstanding_credits locking · 8c379633
      Andrew Morton authored
      Implement the designed locking for t_outstanding_credits
      8c379633
    • Andrew Morton's avatar
      [PATCH] JBD: t_updates locking · 9642d82c
      Andrew Morton authored
      Provide the designating locking for transaction_t.t_updates.
      9642d82c
    • Andrew Morton's avatar
      [PATCH] JBD: t_nr_buffers locking · 48fdf3e6
      Andrew Morton authored
      Now we move more into the locking of the transaction_t fields.
      
      t_nr_buffers locking is just an audit-and-commentary job.
      48fdf3e6
    • Andrew Morton's avatar
      [PATCH] JBD: remove journal_datalist_lock · 0a63cac6
      Andrew Morton authored
      This was a system-wide spinlock.
      
      Simple transformation: make it a filesystem-wide spinlock, in the JBD
      journal.
      
      That's a bit lame, and later it might be nice to make it per-transaction_t.
      But there are interesting ranking and ordering problems with that, especially
      around __journal_refile_buffer().
      0a63cac6
    • Andrew Morton's avatar
      [PATCH] JBD: b_tnext locking · 1fe87216
      Andrew Morton authored
      Implement the designated b_tnext locking.
      
      This also covers b_tprev locking.
      1fe87216
    • Andrew Morton's avatar
      [PATCH] JBD: Implement b_next_transaction locking rules · e87dd8c3
      Andrew Morton authored
      Go through all b_next_transaction instances, implement locking rules.
      (Nothing to do here - b_transaction locking covered it)
      e87dd8c3
    • Andrew Morton's avatar
      [PATCH] JBD: implement b_transaction locking rules · e821ceb2
      Andrew Morton authored
      Go through all use of b_transaction and implement the rules.
      
      Fairly straightforward.
      e821ceb2
    • Andrew Morton's avatar
      [PATCH] JBD: implement b_committed_data locking · b07da5e5
      Andrew Morton authored
      Implement the designed locking schema around the
      journal_head.b_committed_data field.
      b07da5e5
    • Andrew Morton's avatar
      [PATCH] JBD: Finish protection of journal_head.b_frozen_data · 990aef1a
      Andrew Morton authored
      We now start to move across the JBD data structure's fields, from "innermost"
      and outwards.
      
      Start with journal_head.b_frozen_data, because the locking for this field was
      partially implemented in jbd-010-b_committed_data-race-fix.patch.
      
      It is protected by jbd_lock_bh_state().  We keep the lock_journal() and
      spin_lock(&journal_datalist_lock) calls in place.  Later,
      spin_lock(&journal_datalist_lock) is replaced by
      spin_lock(&journal->j_list_lock).
      
      Of course, this completion of the locking around b_frozen_data also puts a
      lot of the locking for other fields in place.
      990aef1a
    • Andrew Morton's avatar
      [PATCH] JBD: rename journal_unlock_journal_head to · eacf9510
      Andrew Morton authored
      journal_unlock_journal_head() is misnamed: what it does is to drop a ref on
      the journal_head and free it if that ref fell to zero.  It doesn't actually
      unlock anything.
      
      Rename it to journal_put_journal_head().
      eacf9510
    • Andrew Morton's avatar
      [PATCH] JBD: fine-grain journal_add_journal_head locking · 1c69516f
      Andrew Morton authored
      buffer_heads and journal_heads are joined at the hip.  We need a lock to
      protect the joint and its refcounts.
      
      JBD is currently using a global spinlock for that.  Change it to use one bit
      in bh->b_state.
      1c69516f
    • Andrew Morton's avatar
      [PATCH] JBD: remove jh_splice_lock · 6fe2ab38
      Andrew Morton authored
      This was a strange spinlock which was designed to prevent another CPU from
      ripping a buffer's journal_head away while this CPU was inspecting its state.
      
      Really, we don't need it - we can inspect that state directly from bh->b_state.
      
      So kill it off, along with a few things which used it which are themselves
      not actually used any more.
      6fe2ab38
    • Andrew Morton's avatar
      [PATCH] JBD: plan JBD locking schema · 13d8498a
      Andrew Morton authored
      This is the start of the JBD locking rework.
      
      The aims of all this are to remove all lock_kernel() calls from JBD, to
      remove all lock_journal() calls (the context switch rate is astonishing when
      the lock_kernel()s are removed) and to remove all sleep_on() instances.
      
      
      
      
      The strategy which is taken is:
      
      a) Define the lcoking schema (this patch)
      
      b) Work through every JBD data structure and implement its locking fully,
         according to the above schema.  We work from "innermost" data structures
         and outwards.
      
      It isn't guaranteed that the filesystem will work very well at all stages of
      this patch series.
      
      
      
      In this patch:
      
      
      Add commentary and various locks to jbd.h describing the locking scheme which
      is about to be implemented.
      
      Initialise the new locks.
      
      Coding-style goodness in jbd.h
      13d8498a
    • Andrew Morton's avatar
      [PATCH] JBD: fix race over access to b_committed_data · 47bb09d8
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      We have a race wherein the block allocator can decide that
      journal_head.b_committed_data is present and then will use it.  But kjournald
      can concurrently free it and set the pointer to NULL.  It goes oops.
      
      We introduce per-buffer_head "spinlocking" based on a bit in b_state.  To do
      this we abstract out pte_chain_lock() and reuse the implementation.
      
      The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning
      in there) and we may move this to a hashed spinlock later.
      47bb09d8
    • Andrew Morton's avatar
      [PATCH] ext3: scalable counters and locks · 17aff938
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      This is a port from ext2 of the fuzzy counters (for Orlov allocator
      heuristics) and the hashed spinlocking (for the inode and bloock allocators).
      17aff938
    • Andrew Morton's avatar
      [PATCH] ext3: concurrent block/inode allocation · c12b9866
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      
      This patch weans ext3 off lock_super()-based protection for the inode and
      block allocators.
      
      It's basically the same as the ext2 changes.
      
      
      1) each group has own spinlock, which is used for group counter
         modifications
      
      2) sb->s_free_blocks_count isn't used any more.  ext2_statfs() and
         find_group_orlov() loop over groups to count free blocks
      
      3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time
         in order to check consistency and to avoid fsck warnings
      
      4) reserved blocks are distributed over last groups
      
      5) ext3_new_block() tries to use non-reserved blocks and if it fails then
         tries to use reserved blocks
      
      6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks,
         therefore they do not call mark_buffer_dirty() for superblock's
         buffer_head. this should reduce I/O a bit
      
      
      Also fix orlov allocator boundary case:
      
      In the interests of SMP scalability the ext2 free blocks and free inodes
      counters are "approximate".  But there is a piece of code in the Orlov
      allocator which fails due to boundary conditions on really small
      filesystems.
      
      Fix that up via a final allocation pass which simply uses first-fit for
      allocatiopn of a directory inode.
      c12b9866