An error occurred fetching the project authors.
  1. 29 Dec, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] JBD: b_committed_data locking fix · 524e63d2
      Andrew Morton authored
      The locking rules say that b_committed_data is covered by
      jbd_lock_bh_state(), so implement that during the start of commit, while
      throwing away unused shadow buffers.
      
      I don't expect that there is really a race here, but them's the rules.
      524e63d2
    • Andrew Morton's avatar
      [PATCH] ext3 scheduling latency fix · 9e77aa68
      Andrew Morton authored
      Sometimes kjournald has to refile a huge number of buffers, because someone
      else wrote them out beforehand - they are all clean.
      
      This happens under a lock and scheduling latencies of 88 milliseconds on a
      2.7GHx CPU were observed.
      
      The patch forward-ports a little bit of the 2.4 low-latency patch to fix this
      problem.
      
      Worst-case on ext3 is now sub-half-millisecond, except for when the RCU
      dentry reaping softirq cuts in :(
      9e77aa68
  2. 22 Oct, 2003 1 commit
  3. 01 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] ext3: fix commit assertion failure · b84ee08e
      Andrew Morton authored
      We're getting asserion failures in commit in data=journal mode.
      
      journal_unmap_buffer() has unexpectedly donated this buffer to the committing
      transaction, and the commit-time assertion doesn't expect that to happen.  It
      doesn't happen in 2.4 because both paths are under lock_journal().
      
      Simply remove the assertion: the commit code will uncheckpoint the buffer and
      then recheckpoint it if needed.
      b84ee08e
  4. 10 Jul, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] JBD: transaction buffer accounting fix · 4152cdfa
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      start_this_handle() takes into account t_outstanding_credits when calculating
      log free space, but journal_next_log_block() accounts for blocks being logged
      also.  Hence, blocks are accounting twice.  This effectively reduces the
      amount of log space available to transactions and forces more commits.
      
      Fix it by decrementing t_outstanding_credits each time we allocate a new
      journal block.
      4152cdfa
    • Andrew Morton's avatar
      [PATCH] misc fixes · ecbaa730
      Andrew Morton authored
      - remove accidental debug code from ext3 commit.
      
      - /proc/profile documentation fix (Randy Dunlap)
      
      - use sb_breadahead() in ext2_preread_inode()
      
      - unused var in mpage_writepages()
      ecbaa730
  5. 02 Jul, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] ext3: fix journal_release_buffer() race · 90153a16
      Andrew Morton authored
      		CPU0				CPU1
      
      	journal_get_write_access(bh)
      	 (Add buffer to t_reserved_list)
      
      					journal_get_write_access(bh)
      					 (It's already on t_reserved_list:
      					  nothing to do)
      
      	 (We decide we don't want to
      	  journal the buffer after all)
      	journal_release_buffer()
      	 (It gets pulled off the transaction)
      
      
      					journal_dirty_metadata()
      					 (The buffer isn't on the reserved
      					  list!  The kernel explodes)
      
      
      Simple fix: just leave the buffer on t_reserved_list in
      journal_release_buffer().  If nobody ends up claiming the buffer then it will
      get thrown away at start of transaction commit.
      90153a16
  6. 25 Jun, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] ext3: fix memory leak · 508fc350
      Andrew Morton authored
      We need to unconditionally brelse() the buffer in there, because
      journal_remove_journal_head() leaves a ref behind.
      
      release_buffer_page() does that.  Call it all the time because we can usually
      strip the buffers and free the page even if it was not marked buffer_freed().
      
      Mainly affects data=journal mode
      508fc350
  7. 20 Jun, 2003 1 commit
  8. 18 Jun, 2003 18 commits
    • Andrew Morton's avatar
      [PATCH] ext3: explicitly free truncated pages · 97c8087c
      Andrew Morton authored
      With data=ordered it is often the case that a quick write-and-truncate will
      leave large numbers of pages on the page LRU with no ->mapping, and attached
      buffers.  Because ext3 was not ready to let the pages go at the time of
      truncation.
      
      These pages are trivially reclaimable, but their seeming absence makes the VM
      overcommit accounting confused (they don't count as "free", nor as
      pagecache).  And they make the /proc/meminfo stats look odd.
      
      So what we do here is to try to strip the buffers from these pages as the
      buffers exit the journal commit.
      97c8087c
    • Andrew Morton's avatar
      [PATCH] JBD: fix race between journal_commit_transaction and · 2ab7407c
      Andrew Morton authored
      start_this_handle() can decide to add this handle to a transaction, but
      kjournald then moves the handle into commit phase.
      
      Extend the coverage of j_state_lock so that start_this_transaction()'s
      examination of journal->j_state is atomic wrt journal_commit_transaction().
      2ab7407c
    • Andrew Morton's avatar
      [PATCH] JBD: additional transaction shutdown locking · 28a4dd1b
      Andrew Morton authored
      Plug a conceivable race with the freeing up of trasnactions, and add some
      more debug checks.
      28a4dd1b
    • Andrew Morton's avatar
      [PATCH] JBD: remove lock_journal() · 9fe6d81a
      Andrew Morton authored
      This filesystem-wide sleeping lock is no longer needed.  Remove it.
      9fe6d81a
    • Andrew Morton's avatar
      [PATCH] JBD: remove lock_kernel() · f16f1182
      Andrew Morton authored
      lock_kernel() is no longer needed in JBD.  Remove all the lock_kernel() calls
      from fs/jbd/.
      
      Here is where I get to say "ex-parrot".
      f16f1182
    • Andrew Morton's avatar
      [PATCH] JBD: remove remaining sleep_on()s · b9c3dc07
      Andrew Morton authored
      Remove the remaining sleep_on() calls from JBD.
      b9c3dc07
    • Andrew Morton's avatar
      [PATCH] JBD: implement dual revoke tables. · ba8edd6d
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      We're about to remove lock_journal(), and it is lock_journal which separates
      the running and committing transaction's revokes on the single revoke table.
      
      So implement two revoke tables and rotate them at commit time.
      ba8edd6d
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_committing_transaction locking · 36c3ce5d
      Andrew Morton authored
      Go through all sites which use j_committing_transaction and ensure that the
      deisgned locking is correctly implemented there.
      36c3ce5d
    • Andrew Morton's avatar
      [PATCH] JBD: implement j_running_transaction locking · e63ebf6b
      Andrew Morton authored
      Implement the designed locking around journal->j_running_transaction.
      
      A lot more of the new locking scheme falls into place.
      e63ebf6b
    • Andrew Morton's avatar
      [PATCH] JBD: implement t_jcb locking · 516e0cf7
      Andrew Morton authored
      Provide the designed locking around the transaction's t_jcb callback list.
      
      It turns out that this is wholly redundant at present.
      516e0cf7
    • Andrew Morton's avatar
      [PATCH] JBD: t_updates locking · 9642d82c
      Andrew Morton authored
      Provide the designating locking for transaction_t.t_updates.
      9642d82c
    • Andrew Morton's avatar
      [PATCH] JBD: remove journal_datalist_lock · 0a63cac6
      Andrew Morton authored
      This was a system-wide spinlock.
      
      Simple transformation: make it a filesystem-wide spinlock, in the JBD
      journal.
      
      That's a bit lame, and later it might be nice to make it per-transaction_t.
      But there are interesting ranking and ordering problems with that, especially
      around __journal_refile_buffer().
      0a63cac6
    • Andrew Morton's avatar
      [PATCH] JBD: implement b_transaction locking rules · e821ceb2
      Andrew Morton authored
      Go through all use of b_transaction and implement the rules.
      
      Fairly straightforward.
      e821ceb2
    • Andrew Morton's avatar
      [PATCH] JBD: Finish protection of journal_head.b_frozen_data · 990aef1a
      Andrew Morton authored
      We now start to move across the JBD data structure's fields, from "innermost"
      and outwards.
      
      Start with journal_head.b_frozen_data, because the locking for this field was
      partially implemented in jbd-010-b_committed_data-race-fix.patch.
      
      It is protected by jbd_lock_bh_state().  We keep the lock_journal() and
      spin_lock(&journal_datalist_lock) calls in place.  Later,
      spin_lock(&journal_datalist_lock) is replaced by
      spin_lock(&journal->j_list_lock).
      
      Of course, this completion of the locking around b_frozen_data also puts a
      lot of the locking for other fields in place.
      990aef1a
    • Andrew Morton's avatar
      [PATCH] JBD: rename journal_unlock_journal_head to · eacf9510
      Andrew Morton authored
      journal_unlock_journal_head() is misnamed: what it does is to drop a ref on
      the journal_head and free it if that ref fell to zero.  It doesn't actually
      unlock anything.
      
      Rename it to journal_put_journal_head().
      eacf9510
    • Andrew Morton's avatar
      [PATCH] JBD: fine-grain journal_add_journal_head locking · 1c69516f
      Andrew Morton authored
      buffer_heads and journal_heads are joined at the hip.  We need a lock to
      protect the joint and its refcounts.
      
      JBD is currently using a global spinlock for that.  Change it to use one bit
      in bh->b_state.
      1c69516f
    • Andrew Morton's avatar
      [PATCH] JBD: remove jh_splice_lock · 6fe2ab38
      Andrew Morton authored
      This was a strange spinlock which was designed to prevent another CPU from
      ripping a buffer's journal_head away while this CPU was inspecting its state.
      
      Really, we don't need it - we can inspect that state directly from bh->b_state.
      
      So kill it off, along with a few things which used it which are themselves
      not actually used any more.
      6fe2ab38
    • Andrew Morton's avatar
      [PATCH] JBD: fix race over access to b_committed_data · 47bb09d8
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      We have a race wherein the block allocator can decide that
      journal_head.b_committed_data is present and then will use it.  But kjournald
      can concurrently free it and set the pointer to NULL.  It goes oops.
      
      We introduce per-buffer_head "spinlocking" based on a bit in b_state.  To do
      this we abstract out pte_chain_lock() and reuse the implementation.
      
      The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning
      in there) and we may move this to a hashed spinlock later.
      47bb09d8
  9. 03 Apr, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] ext3 journal commit I/O error fix · 68569684
      Andrew Morton authored
      From: Hua Zhong <hzhong@cisco.com>
      
      The current ext3 totally ignores I/O errors that happened during a
      journal_force_commit time, causing user space to falsely believe it has
      succeeded, which actually did not.
      
      This patch  checks IO error during  journal_commit_transaction. and aborts
      the journal when there is I/O error.
      
      Originally I thought about reporting the error without doing aborting the
      journal, but it probably needs a new flag. Aborting the journal seems to be
      the easy way to  signal "hey sth is wrong..".
      68569684
  10. 10 Feb, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] Fix synchronous writers to wait properly for the result · 8d49bf3f
      Andrew Morton authored
      Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> points out a bug in
      ll_rw_block() usage.
      
      Typical usage is:
      
      	mark_buffer_dirty(bh);
      	ll_rw_block(WRITE, 1, &bh);
      	wait_on_buffer(bh);
      
      the problem is that if the buffer was locked on entry to this code sequence
      (due to in-progress I/O), ll_rw_block() will not wait, and start new I/O.  So
      this code will wait on the _old_ I/O, and will then continue execution,
      leaving the buffer dirty.
      
      It turns out that all callers were only writing one buffer, and they were all
      waiting on that writeout.  So I added a new sync_dirty_buffer() function:
      
      	void sync_dirty_buffer(struct buffer_head *bh)
      	{
      		lock_buffer(bh);
      		if (test_clear_buffer_dirty(bh)) {
      			get_bh(bh);
      			bh->b_end_io = end_buffer_io_sync;
      			submit_bh(WRITE, bh);
      		} else {
      			unlock_buffer(bh);
      		}
      	}
      
      which allowed a fair amount of code to be removed, while adding the desired
      data-integrity guarantees.
      
      UFS has its own wrappers around ll_rw_block() which got in the way, so this
      operation was open-coded in that case.
      8d49bf3f
  11. 14 Jan, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] fix ext3 memory leak · 2a6cb303
      Andrew Morton authored
      This is the leak which Con found.  Long story...
      
      - If a dirty page is fed into ext3_writepage() during truncate,
        block_write_full_page() will reutrn -EIO (it's outside i_size) and will
        leave the buffers dirty.  In the expectation that discard_buffer() will
        clean them.
      
      - ext3_writepage() then adds the still-dirty buffers to the journal's
        "async data list".  These are buffers which are known to have had IO
        started.  All we need to do is to wait on them in commit.
      
      - meanwhile, truncate will chop the pages off the address_space.  But
        truncate cannot invalidate the buffers (in journal_unmap_buffer()) because
        the buffers are attached to the committing transaction.  (hm.  This
        behaviour in journal_unmap_buffer() is bogus.  We just never need to write
        these buffers.)
      
      - ext3 commit will "wait on writeout" of these writepage buffers (even
        though it was never started) and will then release them from the
        journalling system.
      
      So we end up with pages which are attached to no mapping, which are clean and
      which have dirty buffers.  These are unreclaimable.
      
      
      Aside:
      
        ext3-ordered has two buffer lists: the "sync data list" and the "async
        data list".
      
        The sync list consists of probably-dirty buffers which were dirtied in
        commit_write().  Transaction commit must write all thee out and wait on
        them.
      
        The async list supposedly consists of clean buffers which were attached
        to the journal in ->writepage.  These have had IO started (by writepage) so
        commit merely needs to wait on them.
      
        This is all designed for the 2.4 VM really.  In 2.5, tons of writeback
        goes via writepage (instead of the buffer lru) and these buffers end up
        madly hpooing between the async and sync lists.
      
        Plus it's arguably incorrect to just wait on the writes in commit - if
        the buffers were set dirty again (say, by zap_pte_range()) then perhaps we
        should write them again before committing.
      
      
      So what the patch does is to remove the async list.  All ordered-data buffers
      are now attached to the single "sync data list".  So when we come to commit,
      those buffers which are dirty will have IO started and all buffers are waited
      upon.
      
      This means that the dirty buffers against a clean page which came about from
      block_write_full_page()'s -EIO will be written to disk in commit - this
      cleans them, and the page is now reclaimable.  No leak.
      
      It seems bogus to write these buffers in commit, and indeed it is.  But ext3
      will not allow those blocks to be reused until the commit has ended so there
      is no corruption risk.  And the amount of data involved is low - it only
      comes about as a race between truncate and writepage().
      2a6cb303
  12. 09 Oct, 2002 1 commit
    • Andrew Morton's avatar
      [PATCH] 64-bit sector_t - printk changes and sector_t cleanup · be48ef9e
      Andrew Morton authored
      From Peter Chubb
      
      printk changes: A sector_t can be either 64 or 32 bits, so cast it to a
      printable type that is at least as large as 64-bits on all platforms
      (i.e., cast to unsigned long long and use a %llu format)
      
      Transition to 64-bit sector_t: fix isofs_get_blocks by converting the
      (possibly 64-bit) arg to a long.
      
      SCSI 64-bit sector_t cleanup: capacity now stored as sector_t; make
      sure that the READ_CAPACITY command doesn't sign-extend its returned
      value; avoid 64-bit division when printing size in MB.
      
      Still to do:
       - 16-byte SCSI commands
       - Individual scsi drivers.
      be48ef9e
  13. 04 Jul, 2002 1 commit
    • Andrew Morton's avatar
      [PATCH] JBD commit callback capability · 8b00e4fa
      Andrew Morton authored
      This is a patch which Stephen has applied to ext3's 2.4 repository.
      Originally written by Andreas, generalised somewhat by Stephen.
      
      Add jbd callback mechanism, requested for InterMezzo.  We allow the jbd's
      client to request notification when a given handle's IO finally commits to
      disk, so that clients can manage their own writeback state asynchronously.
      8b00e4fa
  14. 18 Jun, 2002 1 commit
    • Andrew Morton's avatar
      [PATCH] ext3 corruption fix · afb51f81
      Andrew Morton authored
      Stephen and Neil Brown recently worked this out.  It's a
      rare situation which only affects data=journal mode.
      
      Fix problem in data=journal mode where writeback could be left pending on a
      journaled, deleted disk block.  If that block then gets reallocated, we can
      end up with an alias in which the old data can be written back to disk over
      the new.  Thanks to Neil Brown for spotting this and coming up with the
      initial fix.
      afb51f81
  15. 20 May, 2002 1 commit
    • Christoph Hellwig's avatar
      [PATCH] get rid of <linux/locks.h> · bd2b0c85
      Christoph Hellwig authored
      The lock.h header contained some hand-crafted lcoking routines from
      the pre-SMP days.  In 2.5 only lock_super/unlock_super are left,
      guarded by a number of completly unrelated (!) includes.
      
      This patch moves lock_super/unlock_super to fs.h, which defined
      struct super_block that is needed for those to operate it, removes
      locks.h and updates all caller to not include it and add the missing,
      previously nested includes where needed.
      bd2b0c85
  16. 05 May, 2002 1 commit
    • Andrew Morton's avatar
      [PATCH] Fix concurrent writepage and readpage · d58e41ee
      Andrew Morton authored
      Pages under writeback are not locked.  So it is possible (and quite
      legal) for a page to be under readpage() while it is still under
      writeback.  For a partially uptodate page with blocksize <
      PAGE_CACHE_SIZE.
      
      When this happens, the read and write I/O completion handlers get
      confused over the shared BH_Async usage and the page ends up not
      getting PG_writeback cleared.  Truncate gets stuck in D state.
      
      The patch separates the read and write I/O completion state.
      
      It also shuffles the buffer fields around.  Putting the
      commonly-accessed b_state at offset zero shrinks the kernel by a few
      hundred bytes because it can be accessed with indirect addressing, not
      indirect+indexed.
      d58e41ee
  17. 30 Apr, 2002 4 commits
    • Andrew Morton's avatar
      [PATCH] hashed b_wait · f15fe424
      Andrew Morton authored
      Implements hashed waitqueues for buffer_heads.  Drops twelve bytes from
      struct buffer_head.
      f15fe424
    • Andrew Morton's avatar
      [PATCH] cleanup of bh->flags · 39e8cdf7
      Andrew Morton authored
      Moves all buffer_head-related stuff out of linux/fs.h and into
      linux/buffer_head.h.  buffer_head.h is currently included at the very
      end of fs.h.  So it is possible to include buffer_head directly from
      all .c files and remove this nested include.
      
      Also rationalises all the set_buffer_foo() and mark_buffer_bar()
      functions.  We have:
      
      	set_buffer_foo(bh)
      	clear_buffer_foo(bh)
      	buffer_foo(bh)
      
      and, in some cases, where needed:
      
      	test_set_buffer_foo(bh)
      	test_clear_buffer_foo(bh)
      
      And that's it.
      
      BUFFER_FNS() and TAS_BUFFER_FNS() macros generate all the above real
      inline functions.  Normally not a big fan of cpp abuse, but in this
      case it fits.  These function-generating macros are available to
      filesystems to expand their own b_state functions.  JBD uses this in
      one case.
      39e8cdf7
    • Andrew Morton's avatar
      [PATCH] remove buffer unused_list · 4beda7c1
      Andrew Morton authored
      Removes the buffer_head unused list.  Use a mempool instead.
      
      The reduced lock contention provided about a 10% boost on ANton's
      12-way.
      4beda7c1
    • Andrew Morton's avatar
      [PATCH] writeback from address spaces · 090da372
      Andrew Morton authored
      [ I reversed the order in which writeback walks the superblock's
        dirty inodes.  It sped up dbench's unlink phase greatly.  I'm
        such a sleaze ]
      
      The core writeback patch.  Switches file writeback from the dirty
      buffer LRU over to address_space.dirty_pages.
      
      - The buffer LRU is removed
      
      - The buffer hash is removed (uses blockdev pagecache lookups)
      
      - The bdflush and kupdate functions are implemented against
        address_spaces, via pdflush.
      
      - The relationship between pages and buffers is changed.
      
        - If a page has dirty buffers, it is marked dirty
        - If a page is marked dirty, it *may* have dirty buffers.
        - A dirty page may be "partially dirty".  block_write_full_page
          discovers this.
      
      - A bunch of consistency checks of the form
      
      	if (!something_which_should_be_true())
      		buffer_error();
      
        have been introduced.  These fog the code up but are important for
        ensuring that the new buffer/page code is working correctly.
      
      - New locking (inode.i_bufferlist_lock) is introduced for exclusion
        from try_to_free_buffers().  This is needed because set_page_dirty
        is called under spinlock, so it cannot lock the page.  But it
        needs access to page->buffers to set them all dirty.
      
        i_bufferlist_lock is also used to protect inode.i_dirty_buffers.
      
      - fs/inode.c has been split: all the code related to file data writeback
        has been moved into fs/fs-writeback.c
      
      - Code related to file data writeback at the address_space level is in
        the new mm/page-writeback.c
      
      - try_to_free_buffers() is now non-blocking
      
      - Switches vmscan.c over to understand that all pages with dirty data
        are now marked dirty.
      
      - Introduces a new a_op for VM writeback:
      
      	->vm_writeback(struct page *page, int *nr_to_write)
      
        this is a bit half-baked at present.  The intent is that the address_space
        is given the opportunity to perform clustered writeback.  To allow it to
        opportunistically write out disk-contiguous dirty data which may be in other zones.
        To allow delayed-allocate filesystems to get good disk layout.
      
      - Added address_space.io_pages.  Pages which are being prepared for
        writeback.  This is here for two reasons:
      
        1: It will be needed later, when BIOs are assembled direct
           against pagecache, bypassing the buffer layer.  It avoids a
           deadlock which would occur if someone moved the page back onto the
           dirty_pages list after it was added to the BIO, but before it was
           submitted.  (hmm.  This may not be a problem with PG_writeback logic).
      
        2: Avoids a livelock which would occur if some other thread is continually
           redirtying pages.
      
      - There are two known performance problems in this code:
      
        1: Pages which are locked for writeback cause undesirable
           blocking when they are being overwritten.  A patch which leaves
           pages unlocked during writeback comes later in the series.
      
        2: While inodes are under writeback, they are locked.  This
           causes namespace lookups against the file to get unnecessarily
           blocked in wait_on_inode().  This is a fairly minor problem.
      
           I don't have a fix for this at present - I'll fix this when I
           attach dirty address_spaces direct to super_blocks.
      
      - The patch vastly increases the amount of dirty data which the
        kernel permits highmem machines to maintain.  This is because the
        balancing decisions are made against the amount of memory in the
        machine, not against the amount of buffercache-allocatable memory.
      
        This may be very wrong, although it works fine for me (2.5 gigs).
      
        We can trivially go back to the old-style throttling with
        s/nr_free_pagecache_pages/nr_free_buffer_pages/ in
        balance_dirty_pages().  But better would be to allow blockdev
        mappings to use highmem (I'm thinking about this one, slowly).  And
        to move writer-throttling and writeback decisions into the VM (modulo
        the file-overwriting problem).
      
      - Drops 24 bytes from struct buffer_head.  More to come.
      
      - There's some gunk like super_block.flags:MS_FLUSHING which needs to
        be killed.  Need a better way of providing collision avoidance
        between pdflush threads, to prevent more than one pdflush thread
        working a disk at the same time.
      
        The correct way to do that is to put a flag in the request queue to
        say "there's a pdlfush thread working this disk".  This is easy to
        do: just generalise the "ra_pages" pointer to point at a struct which
        includes ra_pages and the new collision-avoidance flag.
      090da372
  18. 09 Feb, 2002 1 commit