Commits · e56b865f202897f2dbdc173a6e4a0719f6b5a564 · nexedi / linux

An error occurred fetching the project authors.

29 Dec, 2003 2 commits

[PATCH] JBD: b_committed_data locking fix · 524e63d2

Andrew Morton authored 21 years ago

The locking rules say that b_committed_data is covered by
jbd_lock_bh_state(), so implement that during the start of commit, while
throwing away unused shadow buffers.

I don't expect that there is really a race here, but them's the rules.

524e63d2

[PATCH] ext3 scheduling latency fix · 9e77aa68

Andrew Morton authored 21 years ago

Sometimes kjournald has to refile a huge number of buffers, because someone
else wrote them out beforehand - they are all clean.

This happens under a lock and scheduling latencies of 88 milliseconds on a
2.7GHx CPU were observed.

The patch forward-ports a little bit of the 2.4 low-latency patch to fix this
problem.

Worst-case on ext3 is now sub-half-millisecond, except for when the RCU
dentry reaping softirq cuts in :(

9e77aa68

22 Oct, 2003 1 commit
- [PATCH] Fix JBD memory leak · 7ae0eef7
  Andrew Morton authored 21 years ago
```
Plug the two-megabyte-per-day memory leak.
```
  7ae0eef7
01 Aug, 2003 1 commit

[PATCH] ext3: fix commit assertion failure · b84ee08e

Andrew Morton authored 21 years ago

We're getting asserion failures in commit in data=journal mode.

journal_unmap_buffer() has unexpectedly donated this buffer to the committing
transaction, and the commit-time assertion doesn't expect that to happen. It
doesn't happen in 2.4 because both paths are under lock_journal().

Simply remove the assertion: the commit code will uncheckpoint the buffer and
then recheckpoint it if needed.

b84ee08e

10 Jul, 2003 2 commits

[PATCH] JBD: transaction buffer accounting fix · 4152cdfa

Andrew Morton authored 21 years ago

From: Alex Tomas <bzzz@tmi.comex.ru>

start_this_handle() takes into account t_outstanding_credits when calculating
log free space, but journal_next_log_block() accounts for blocks being logged
also.  Hence, blocks are accounting twice.  This effectively reduces the
amount of log space available to transactions and forces more commits.

Fix it by decrementing t_outstanding_credits each time we allocate a new
journal block.

4152cdfa

[PATCH] misc fixes · ecbaa730

Andrew Morton authored 21 years ago

- remove accidental debug code from ext3 commit.

- /proc/profile documentation fix (Randy Dunlap)

- use sb_breadahead() in ext2_preread_inode()

- unused var in mpage_writepages()

ecbaa730

02 Jul, 2003 1 commit

[PATCH] ext3: fix journal_release_buffer() race · 90153a16

Andrew Morton authored 21 years ago

		CPU0				CPU1

	journal_get_write_access(bh)
	 (Add buffer to t_reserved_list)

					journal_get_write_access(bh)
					 (It's already on t_reserved_list:
					  nothing to do)

	 (We decide we don't want to
	  journal the buffer after all)
	journal_release_buffer()
	 (It gets pulled off the transaction)


					journal_dirty_metadata()
					 (The buffer isn't on the reserved
					  list!  The kernel explodes)


Simple fix: just leave the buffer on t_reserved_list in
journal_release_buffer().  If nobody ends up claiming the buffer then it will
get thrown away at start of transaction commit.

90153a16

25 Jun, 2003 1 commit

[PATCH] ext3: fix memory leak · 508fc350

Andrew Morton authored 21 years ago

We need to unconditionally brelse() the buffer in there, because
journal_remove_journal_head() leaves a ref behind.

release_buffer_page() does that.  Call it all the time because we can usually
strip the buffers and free the page even if it was not marked buffer_freed().

Mainly affects data=journal mode

508fc350

20 Jun, 2003 1 commit
- [PATCH] ext3/JBD: remove trailing whitespace · f5d256f8
  Andrew Morton authored 21 years ago
```
ext3 and JBD still have enormous numbers of lines which end in tabs.  Fix
them all up.
```
  f5d256f8
18 Jun, 2003 18 commits

[PATCH] ext3: explicitly free truncated pages · 97c8087c

Andrew Morton authored 21 years ago

With data=ordered it is often the case that a quick write-and-truncate will
leave large numbers of pages on the page LRU with no ->mapping, and attached
buffers.  Because ext3 was not ready to let the pages go at the time of
truncation.

These pages are trivially reclaimable, but their seeming absence makes the VM
overcommit accounting confused (they don't count as "free", nor as
pagecache).  And they make the /proc/meminfo stats look odd.

So what we do here is to try to strip the buffers from these pages as the
buffers exit the journal commit.

97c8087c

[PATCH] JBD: fix race between journal_commit_transaction and · 2ab7407c

Andrew Morton authored 21 years ago

start_this_handle() can decide to add this handle to a transaction, but
kjournald then moves the handle into commit phase.

Extend the coverage of j_state_lock so that start_this_transaction()'s
examination of journal->j_state is atomic wrt journal_commit_transaction().

2ab7407c

[PATCH] JBD: additional transaction shutdown locking · 28a4dd1b
Andrew Morton authored 21 years ago
```
Plug a conceivable race with the freeing up of trasnactions, and add some
more debug checks.
```
28a4dd1b
[PATCH] JBD: remove lock_journal() · 9fe6d81a
Andrew Morton authored 21 years ago
```
This filesystem-wide sleeping lock is no longer needed.  Remove it.
```
9fe6d81a

[PATCH] JBD: remove lock_kernel() · f16f1182

Andrew Morton authored 21 years ago

lock_kernel() is no longer needed in JBD.  Remove all the lock_kernel() calls
from fs/jbd/.

Here is where I get to say "ex-parrot".

f16f1182

[PATCH] JBD: remove remaining sleep_on()s · b9c3dc07
Andrew Morton authored 21 years ago
```
Remove the remaining sleep_on() calls from JBD.
```
b9c3dc07

[PATCH] JBD: implement dual revoke tables. · ba8edd6d

Andrew Morton authored 21 years ago

From: Alex Tomas <bzzz@tmi.comex.ru>

We're about to remove lock_journal(), and it is lock_journal which separates
the running and committing transaction's revokes on the single revoke table.

So implement two revoke tables and rotate them at commit time.

ba8edd6d

[PATCH] JBD: implement j_committing_transaction locking · 36c3ce5d

Andrew Morton authored 21 years ago

Go through all sites which use j_committing_transaction and ensure that the
deisgned locking is correctly implemented there.

36c3ce5d

[PATCH] JBD: implement j_running_transaction locking · e63ebf6b

Andrew Morton authored 21 years ago

Implement the designed locking around journal->j_running_transaction.

A lot more of the new locking scheme falls into place.

e63ebf6b

[PATCH] JBD: implement t_jcb locking · 516e0cf7

Andrew Morton authored 21 years ago

Provide the designed locking around the transaction's t_jcb callback list.

It turns out that this is wholly redundant at present.

516e0cf7

[PATCH] JBD: t_updates locking · 9642d82c
Andrew Morton authored 21 years ago
```
Provide the designating locking for transaction_t.t_updates.
```
9642d82c

[PATCH] JBD: remove journal_datalist_lock · 0a63cac6

Andrew Morton authored 21 years ago

This was a system-wide spinlock.

Simple transformation: make it a filesystem-wide spinlock, in the JBD
journal.

That's a bit lame, and later it might be nice to make it per-transaction_t.
But there are interesting ranking and ordering problems with that, especially
around __journal_refile_buffer().

0a63cac6

[PATCH] JBD: implement b_transaction locking rules · e821ceb2
Andrew Morton authored 21 years ago
```
Go through all use of b_transaction and implement the rules.

Fairly straightforward.
```
e821ceb2

[PATCH] JBD: Finish protection of journal_head.b_frozen_data · 990aef1a

Andrew Morton authored 21 years ago

We now start to move across the JBD data structure's fields, from "innermost"
and outwards.

Start with journal_head.b_frozen_data, because the locking for this field was
partially implemented in jbd-010-b_committed_data-race-fix.patch.

It is protected by jbd_lock_bh_state().  We keep the lock_journal() and
spin_lock(&journal_datalist_lock) calls in place.  Later,
spin_lock(&journal_datalist_lock) is replaced by
spin_lock(&journal->j_list_lock).

Of course, this completion of the locking around b_frozen_data also puts a
lot of the locking for other fields in place.

990aef1a

[PATCH] JBD: rename journal_unlock_journal_head to · eacf9510

Andrew Morton authored 21 years ago

journal_unlock_journal_head() is misnamed: what it does is to drop a ref on
the journal_head and free it if that ref fell to zero.  It doesn't actually
unlock anything.

Rename it to journal_put_journal_head().

eacf9510

[PATCH] JBD: fine-grain journal_add_journal_head locking · 1c69516f

Andrew Morton authored 21 years ago

buffer_heads and journal_heads are joined at the hip.  We need a lock to
protect the joint and its refcounts.

JBD is currently using a global spinlock for that.  Change it to use one bit
in bh->b_state.

1c69516f

[PATCH] JBD: remove jh_splice_lock · 6fe2ab38

Andrew Morton authored 21 years ago

This was a strange spinlock which was designed to prevent another CPU from
ripping a buffer's journal_head away while this CPU was inspecting its state.

Really, we don't need it - we can inspect that state directly from bh->b_state.

So kill it off, along with a few things which used it which are themselves
not actually used any more.

6fe2ab38

[PATCH] JBD: fix race over access to b_committed_data · 47bb09d8

Andrew Morton authored 21 years ago

From: Alex Tomas <bzzz@tmi.comex.ru>

We have a race wherein the block allocator can decide that
journal_head.b_committed_data is present and then will use it. But kjournald
can concurrently free it and set the pointer to NULL. It goes oops.

We introduce per-buffer_head "spinlocking" based on a bit in b_state. To do
this we abstract out pte_chain_lock() and reuse the implementation.

The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning
in there) and we may move this to a hashed spinlock later.

47bb09d8

03 Apr, 2003 1 commit

[PATCH] ext3 journal commit I/O error fix · 68569684

Andrew Morton authored 21 years ago

From: Hua Zhong <hzhong@cisco.com>

The current ext3 totally ignores I/O errors that happened during a
journal_force_commit time, causing user space to falsely believe it has
succeeded, which actually did not.

This patch  checks IO error during  journal_commit_transaction. and aborts
the journal when there is I/O error.

Originally I thought about reporting the error without doing aborting the
journal, but it probably needs a new flag. Aborting the journal seems to be
the easy way to  signal "hey sth is wrong..".

68569684

10 Feb, 2003 1 commit

[PATCH] Fix synchronous writers to wait properly for the result · 8d49bf3f

Andrew Morton authored 21 years ago

Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> points out a bug in
ll_rw_block() usage.

Typical usage is:

	mark_buffer_dirty(bh);
	ll_rw_block(WRITE, 1, &bh);
	wait_on_buffer(bh);

the problem is that if the buffer was locked on entry to this code sequence
(due to in-progress I/O), ll_rw_block() will not wait, and start new I/O.  So
this code will wait on the _old_ I/O, and will then continue execution,
leaving the buffer dirty.

It turns out that all callers were only writing one buffer, and they were all
waiting on that writeout.  So I added a new sync_dirty_buffer() function:

	void sync_dirty_buffer(struct buffer_head *bh)
	{
		lock_buffer(bh);
		if (test_clear_buffer_dirty(bh)) {
			get_bh(bh);
			bh->b_end_io = end_buffer_io_sync;
			submit_bh(WRITE, bh);
		} else {
			unlock_buffer(bh);
		}
	}

which allowed a fair amount of code to be removed, while adding the desired
data-integrity guarantees.

UFS has its own wrappers around ll_rw_block() which got in the way, so this
operation was open-coded in that case.

8d49bf3f

14 Jan, 2003 1 commit

[PATCH] fix ext3 memory leak · 2a6cb303

Andrew Morton authored 21 years ago

This is the leak which Con found.  Long story...

- If a dirty page is fed into ext3_writepage() during truncate,
  block_write_full_page() will reutrn -EIO (it's outside i_size) and will
  leave the buffers dirty.  In the expectation that discard_buffer() will
  clean them.

- ext3_writepage() then adds the still-dirty buffers to the journal's
  "async data list".  These are buffers which are known to have had IO
  started.  All we need to do is to wait on them in commit.

- meanwhile, truncate will chop the pages off the address_space.  But
  truncate cannot invalidate the buffers (in journal_unmap_buffer()) because
  the buffers are attached to the committing transaction.  (hm.  This
  behaviour in journal_unmap_buffer() is bogus.  We just never need to write
  these buffers.)

- ext3 commit will "wait on writeout" of these writepage buffers (even
  though it was never started) and will then release them from the
  journalling system.

So we end up with pages which are attached to no mapping, which are clean and
which have dirty buffers.  These are unreclaimable.


Aside:

  ext3-ordered has two buffer lists: the "sync data list" and the "async
  data list".

  The sync list consists of probably-dirty buffers which were dirtied in
  commit_write().  Transaction commit must write all thee out and wait on
  them.

  The async list supposedly consists of clean buffers which were attached
  to the journal in ->writepage.  These have had IO started (by writepage) so
  commit merely needs to wait on them.

  This is all designed for the 2.4 VM really.  In 2.5, tons of writeback
  goes via writepage (instead of the buffer lru) and these buffers end up
  madly hpooing between the async and sync lists.

  Plus it's arguably incorrect to just wait on the writes in commit - if
  the buffers were set dirty again (say, by zap_pte_range()) then perhaps we
  should write them again before committing.


So what the patch does is to remove the async list.  All ordered-data buffers
are now attached to the single "sync data list".  So when we come to commit,
those buffers which are dirty will have IO started and all buffers are waited
upon.

This means that the dirty buffers against a clean page which came about from
block_write_full_page()'s -EIO will be written to disk in commit - this
cleans them, and the page is now reclaimable.  No leak.

It seems bogus to write these buffers in commit, and indeed it is.  But ext3
will not allow those blocks to be reused until the commit has ended so there
is no corruption risk.  And the amount of data involved is low - it only
comes about as a race between truncate and writepage().

2a6cb303

09 Oct, 2002 1 commit

[PATCH] 64-bit sector_t - printk changes and sector_t cleanup · be48ef9e

Andrew Morton authored 22 years ago

From Peter Chubb

printk changes: A sector_t can be either 64 or 32 bits, so cast it to a
printable type that is at least as large as 64-bits on all platforms
(i.e., cast to unsigned long long and use a %llu format)

Transition to 64-bit sector_t: fix isofs_get_blocks by converting the
(possibly 64-bit) arg to a long.

SCSI 64-bit sector_t cleanup: capacity now stored as sector_t; make
sure that the READ_CAPACITY command doesn't sign-extend its returned
value; avoid 64-bit division when printing size in MB.

Still to do:
 - 16-byte SCSI commands
 - Individual scsi drivers.

be48ef9e

04 Jul, 2002 1 commit

[PATCH] JBD commit callback capability · 8b00e4fa

Andrew Morton authored 22 years ago

This is a patch which Stephen has applied to ext3's 2.4 repository.
Originally written by Andreas, generalised somewhat by Stephen.

Add jbd callback mechanism, requested for InterMezzo. We allow the jbd's
client to request notification when a given handle's IO finally commits to
disk, so that clients can manage their own writeback state asynchronously.

8b00e4fa

18 Jun, 2002 1 commit

[PATCH] ext3 corruption fix · afb51f81

Andrew Morton authored 22 years ago

Stephen and Neil Brown recently worked this out.  It's a
rare situation which only affects data=journal mode.

Fix problem in data=journal mode where writeback could be left pending on a
journaled, deleted disk block.  If that block then gets reallocated, we can
end up with an alias in which the old data can be written back to disk over
the new.  Thanks to Neil Brown for spotting this and coming up with the
initial fix.

afb51f81

20 May, 2002 1 commit

[PATCH] get rid of <linux/locks.h> · bd2b0c85

Christoph Hellwig authored 22 years ago

The lock.h header contained some hand-crafted lcoking routines from
the pre-SMP days.  In 2.5 only lock_super/unlock_super are left,
guarded by a number of completly unrelated (!) includes.

This patch moves lock_super/unlock_super to fs.h, which defined
struct super_block that is needed for those to operate it, removes
locks.h and updates all caller to not include it and add the missing,
previously nested includes where needed.

bd2b0c85

05 May, 2002 1 commit

[PATCH] Fix concurrent writepage and readpage · d58e41ee

Andrew Morton authored 22 years ago

Pages under writeback are not locked.  So it is possible (and quite
legal) for a page to be under readpage() while it is still under
writeback.  For a partially uptodate page with blocksize <
PAGE_CACHE_SIZE.

When this happens, the read and write I/O completion handlers get
confused over the shared BH_Async usage and the page ends up not
getting PG_writeback cleared.  Truncate gets stuck in D state.

The patch separates the read and write I/O completion state.

It also shuffles the buffer fields around.  Putting the
commonly-accessed b_state at offset zero shrinks the kernel by a few
hundred bytes because it can be accessed with indirect addressing, not
indirect+indexed.

d58e41ee

30 Apr, 2002 4 commits

[PATCH] hashed b_wait · f15fe424

Andrew Morton authored 22 years ago

Implements hashed waitqueues for buffer_heads.  Drops twelve bytes from
struct buffer_head.

f15fe424

[PATCH] cleanup of bh->flags · 39e8cdf7

Andrew Morton authored 22 years ago

Moves all buffer_head-related stuff out of linux/fs.h and into
linux/buffer_head.h.  buffer_head.h is currently included at the very
end of fs.h.  So it is possible to include buffer_head directly from
all .c files and remove this nested include.

Also rationalises all the set_buffer_foo() and mark_buffer_bar()
functions.  We have:

	set_buffer_foo(bh)
	clear_buffer_foo(bh)
	buffer_foo(bh)

and, in some cases, where needed:

	test_set_buffer_foo(bh)
	test_clear_buffer_foo(bh)

And that's it.

BUFFER_FNS() and TAS_BUFFER_FNS() macros generate all the above real
inline functions.  Normally not a big fan of cpp abuse, but in this
case it fits.  These function-generating macros are available to
filesystems to expand their own b_state functions.  JBD uses this in
one case.

39e8cdf7

[PATCH] remove buffer unused_list · 4beda7c1

Andrew Morton authored 22 years ago

Removes the buffer_head unused list.  Use a mempool instead.

The reduced lock contention provided about a 10% boost on ANton's
12-way.

4beda7c1

[PATCH] writeback from address spaces · 090da372

Andrew Morton authored 22 years ago

[ I reversed the order in which writeback walks the superblock's
  dirty inodes.  It sped up dbench's unlink phase greatly.  I'm
  such a sleaze ]

The core writeback patch.  Switches file writeback from the dirty
buffer LRU over to address_space.dirty_pages.

- The buffer LRU is removed

- The buffer hash is removed (uses blockdev pagecache lookups)

- The bdflush and kupdate functions are implemented against
  address_spaces, via pdflush.

- The relationship between pages and buffers is changed.

  - If a page has dirty buffers, it is marked dirty
  - If a page is marked dirty, it *may* have dirty buffers.
  - A dirty page may be "partially dirty".  block_write_full_page
    discovers this.

- A bunch of consistency checks of the form

	if (!something_which_should_be_true())
		buffer_error();

  have been introduced.  These fog the code up but are important for
  ensuring that the new buffer/page code is working correctly.

- New locking (inode.i_bufferlist_lock) is introduced for exclusion
  from try_to_free_buffers().  This is needed because set_page_dirty
  is called under spinlock, so it cannot lock the page.  But it
  needs access to page->buffers to set them all dirty.

  i_bufferlist_lock is also used to protect inode.i_dirty_buffers.

- fs/inode.c has been split: all the code related to file data writeback
  has been moved into fs/fs-writeback.c

- Code related to file data writeback at the address_space level is in
  the new mm/page-writeback.c

- try_to_free_buffers() is now non-blocking

- Switches vmscan.c over to understand that all pages with dirty data
  are now marked dirty.

- Introduces a new a_op for VM writeback:

	->vm_writeback(struct page *page, int *nr_to_write)

  this is a bit half-baked at present.  The intent is that the address_space
  is given the opportunity to perform clustered writeback.  To allow it to
  opportunistically write out disk-contiguous dirty data which may be in other zones.
  To allow delayed-allocate filesystems to get good disk layout.

- Added address_space.io_pages.  Pages which are being prepared for
  writeback.  This is here for two reasons:

  1: It will be needed later, when BIOs are assembled direct
     against pagecache, bypassing the buffer layer.  It avoids a
     deadlock which would occur if someone moved the page back onto the
     dirty_pages list after it was added to the BIO, but before it was
     submitted.  (hmm.  This may not be a problem with PG_writeback logic).

  2: Avoids a livelock which would occur if some other thread is continually
     redirtying pages.

- There are two known performance problems in this code:

  1: Pages which are locked for writeback cause undesirable
     blocking when they are being overwritten.  A patch which leaves
     pages unlocked during writeback comes later in the series.

  2: While inodes are under writeback, they are locked.  This
     causes namespace lookups against the file to get unnecessarily
     blocked in wait_on_inode().  This is a fairly minor problem.

     I don't have a fix for this at present - I'll fix this when I
     attach dirty address_spaces direct to super_blocks.

- The patch vastly increases the amount of dirty data which the
  kernel permits highmem machines to maintain.  This is because the
  balancing decisions are made against the amount of memory in the
  machine, not against the amount of buffercache-allocatable memory.

  This may be very wrong, although it works fine for me (2.5 gigs).

  We can trivially go back to the old-style throttling with
  s/nr_free_pagecache_pages/nr_free_buffer_pages/ in
  balance_dirty_pages().  But better would be to allow blockdev
  mappings to use highmem (I'm thinking about this one, slowly).  And
  to move writer-throttling and writeback decisions into the VM (modulo
  the file-overwriting problem).

- Drops 24 bytes from struct buffer_head.  More to come.

- There's some gunk like super_block.flags:MS_FLUSHING which needs to
  be killed.  Need a better way of providing collision avoidance
  between pdflush threads, to prevent more than one pdflush thread
  working a disk at the same time.

  The correct way to do that is to put a flag in the request queue to
  say "there's a pdlfush thread working this disk".  This is easy to
  do: just generalise the "ra_pages" pointer to point at a struct which
  includes ra_pages and the new collision-avoidance flag.

090da372

09 Feb, 2002 1 commit

[PATCH] includes cleanup, 2nd try. · 7021dc36

Dave Jones authored 22 years ago

Big bits first, I'll redo the smaller bits tomorrow after some sleep.
Same as last time, rediffed against pre5

7021dc36