Commits · 28a4dd1ba782d5b0f27dca9632296a452c204286 · Kirill Smelkov / linux

18 Jun, 2003 39 commits

[PATCH] JBD: additional transaction shutdown locking · 28a4dd1b
Andrew Morton authored Jun 17, 2003
```
Plug a conceivable race with the freeing up of trasnactions, and add some
more debug checks.
```
28a4dd1b
[PATCH] JBD: add some locking assertions · 833f3d15
Andrew Morton authored Jun 17, 2003
```
Drop in a few assertions to ensure that the locking rules are being adhered
to.
```
833f3d15
[PATCH] JBD: buffer freeing non-race comment · eba4b4b7
Andrew Morton authored Jun 17, 2003
```
Add a comment describing why a race isn't there.
```
eba4b4b7

[PATCH] ext3: ext3_writepage race fix · dd71e33f

Andrew Morton authored Jun 17, 2003

After ext3_writepage() has called block_write_full_page() it will walk the
page's buffer ring dropping the buffer_head refcounts.

It does this wrong - on the final loop it will dereference the buffer_head
which it just dropped the refcount on.  Poisoned oopses have been seen
against bh->b_this_page.

Change it to take a local copy of b_this_page prior to dropping the bh's
refcount.

dd71e33f

[PATCH] JBD: journal_unmap_buffer race fix · e3380360
Andrew Morton authored Jun 17, 2003
```
We need to check that buffer is still journalled _after_ taking the right
locks.
```
e3380360

[PATCH] JBD: journal_release_buffer: handle credits fix · 4b3044b0

Andrew Morton authored Jun 17, 2003

There's a bug: a caller tries to journal a buffer and then decides he didn't
want to after all.  He calls journal_release_buffer().

But journal_release_buffer() is only allowed to give the caller a buffer
credit back if it was the caller who added the buffer in the first place.

journal_release_buffer() currently looks at the buffer state to work that
out, but gets it wrong: if the buffer has been moved onto a different list by
some other part of ext3 the credit is bogusly not returned to the caller and
the fs can later go BUG due to handle credit exhaustion.


The fix:

Change journal_get_undo_access() to return the number of buffers which the
caller actually added to the journal.  (one or zero).

When the caller later calls journal_release_buffer(), he passes in that
count, to tell journal_release_buffer() how many credits the caller should
get back.

For API consistency this change should also be made to
journal_get_create_access() and journal_get_write_access().  But there is no
requirement for that in ext3 at this time.


The remaining bug:

This logic effectively gives another transaction handle a free buffer credit.
These could conceivably accumulate and cause a journal overflow.  This is a
separate problem and needs changes to the t_outstanding_credits accounting
and the logic in start_this_handle.

4b3044b0

[PATCH] JBD: remove lock_journal() · 9fe6d81a
Andrew Morton authored Jun 17, 2003
```
This filesystem-wide sleeping lock is no longer needed.  Remove it.
```
9fe6d81a

[PATCH] JBD: remove lock_kernel() · f16f1182

Andrew Morton authored Jun 17, 2003

lock_kernel() is no longer needed in JBD.  Remove all the lock_kernel() calls
from fs/jbd/.

Here is where I get to say "ex-parrot".

f16f1182

[PATCH] JBD: remove remaining sleep_on()s · b9c3dc07
Andrew Morton authored Jun 17, 2003
```
Remove the remaining sleep_on() calls from JBD.
```
b9c3dc07

[PATCH] JBD: implement dual revoke tables. · ba8edd6d

Andrew Morton authored Jun 17, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

We're about to remove lock_journal(), and it is lock_journal which separates
the running and committing transaction's revokes on the single revoke table.

So implement two revoke tables and rotate them at commit time.

ba8edd6d

[PATCH] JBD: implement j_commit_request locking · ca340395
Andrew Morton authored Jun 17, 2003
```
Impement the designed locking around journal->j_commit_request.
```
ca340395
[PATCH] JBD: implement journal->j_commit_sequence locking · 6b65bc1f
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking around journal->j_commit_sequence.
```
6b65bc1f
[PATCH] JBD: implement journal->j_free locking · e3a03fb8
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking around journal->j_free.

Things get a lot better here, too.
```
e3a03fb8
[PATCH] JBD: implement journal->j_tail locking · 2e89f6eb
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking around journal->j_tail.
```
2e89f6eb
[PATCH] JBD: implement journal->j_head locking · 23ce7898
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking around journal->j_head.
```
23ce7898
[PATCH] JBD: implement j_checkpoint_transactions locking · 2d16ce3a
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking around j_checkpoint_transactions.  It was all
pretty much there actually.
```
2d16ce3a

[PATCH] JBD: implement j_committing_transaction locking · 36c3ce5d

Andrew Morton authored Jun 17, 2003

Go through all sites which use j_committing_transaction and ensure that the
deisgned locking is correctly implemented there.

36c3ce5d

[PATCH] JBD: implement j_running_transaction locking · e63ebf6b

Andrew Morton authored Jun 17, 2003

Implement the designed locking around journal->j_running_transaction.

A lot more of the new locking scheme falls into place.

e63ebf6b

[PATCH] JBD: implement j_barrier_count locking · 152dede7

Andrew Morton authored Jun 17, 2003

We now start to move onto the fields of the topmost JBD data structure: the
journal.

The patch implements the designed locking around the j_barrier_count member.
And as a part of that, a lot of the new locking scheme is implemented.
Several lock_kernel()s and sleep_on()s go away.

152dede7

[PATCH] JBD: implement t_jcb locking · 516e0cf7

Andrew Morton authored Jun 17, 2003

Provide the designed locking around the transaction's t_jcb callback list.

It turns out that this is wholly redundant at present.

516e0cf7

[PATCH] JBD: implement t_outstanding_credits locking · 8c379633
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking for t_outstanding_credits
```
8c379633
[PATCH] JBD: t_updates locking · 9642d82c
Andrew Morton authored Jun 17, 2003
```
Provide the designating locking for transaction_t.t_updates.
```
9642d82c

[PATCH] JBD: t_nr_buffers locking · 48fdf3e6

Andrew Morton authored Jun 17, 2003

Now we move more into the locking of the transaction_t fields.

t_nr_buffers locking is just an audit-and-commentary job.

48fdf3e6

[PATCH] JBD: remove journal_datalist_lock · 0a63cac6

Andrew Morton authored Jun 17, 2003

This was a system-wide spinlock.

Simple transformation: make it a filesystem-wide spinlock, in the JBD
journal.

That's a bit lame, and later it might be nice to make it per-transaction_t.
But there are interesting ranking and ordering problems with that, especially
around __journal_refile_buffer().

0a63cac6

[PATCH] JBD: b_tnext locking · 1fe87216
Andrew Morton authored Jun 17, 2003
```
Implement the designated b_tnext locking.

This also covers b_tprev locking.
```
1fe87216

[PATCH] JBD: Implement b_next_transaction locking rules · e87dd8c3

Andrew Morton authored Jun 17, 2003

Go through all b_next_transaction instances, implement locking rules.
(Nothing to do here - b_transaction locking covered it)

e87dd8c3

[PATCH] JBD: implement b_transaction locking rules · e821ceb2
Andrew Morton authored Jun 17, 2003
```
Go through all use of b_transaction and implement the rules.

Fairly straightforward.
```
e821ceb2
[PATCH] JBD: implement b_committed_data locking · b07da5e5
Andrew Morton authored Jun 17, 2003
```
Implement the designed locking schema around the
journal_head.b_committed_data field.
```
b07da5e5

[PATCH] JBD: Finish protection of journal_head.b_frozen_data · 990aef1a

Andrew Morton authored Jun 17, 2003

We now start to move across the JBD data structure's fields, from "innermost"
and outwards.

Start with journal_head.b_frozen_data, because the locking for this field was
partially implemented in jbd-010-b_committed_data-race-fix.patch.

It is protected by jbd_lock_bh_state().  We keep the lock_journal() and
spin_lock(&journal_datalist_lock) calls in place.  Later,
spin_lock(&journal_datalist_lock) is replaced by
spin_lock(&journal->j_list_lock).

Of course, this completion of the locking around b_frozen_data also puts a
lot of the locking for other fields in place.

990aef1a

[PATCH] JBD: rename journal_unlock_journal_head to · eacf9510

Andrew Morton authored Jun 17, 2003

journal_unlock_journal_head() is misnamed: what it does is to drop a ref on
the journal_head and free it if that ref fell to zero.  It doesn't actually
unlock anything.

Rename it to journal_put_journal_head().

eacf9510

[PATCH] JBD: fine-grain journal_add_journal_head locking · 1c69516f

Andrew Morton authored Jun 17, 2003

buffer_heads and journal_heads are joined at the hip.  We need a lock to
protect the joint and its refcounts.

JBD is currently using a global spinlock for that.  Change it to use one bit
in bh->b_state.

1c69516f

[PATCH] JBD: remove jh_splice_lock · 6fe2ab38

Andrew Morton authored Jun 17, 2003

This was a strange spinlock which was designed to prevent another CPU from
ripping a buffer's journal_head away while this CPU was inspecting its state.

Really, we don't need it - we can inspect that state directly from bh->b_state.

So kill it off, along with a few things which used it which are themselves
not actually used any more.

6fe2ab38

[PATCH] JBD: plan JBD locking schema · 13d8498a

Andrew Morton authored Jun 17, 2003

This is the start of the JBD locking rework.

The aims of all this are to remove all lock_kernel() calls from JBD, to
remove all lock_journal() calls (the context switch rate is astonishing when
the lock_kernel()s are removed) and to remove all sleep_on() instances.




The strategy which is taken is:

a) Define the lcoking schema (this patch)

b) Work through every JBD data structure and implement its locking fully,
   according to the above schema.  We work from "innermost" data structures
   and outwards.

It isn't guaranteed that the filesystem will work very well at all stages of
this patch series.



In this patch:


Add commentary and various locks to jbd.h describing the locking scheme which
is about to be implemented.

Initialise the new locks.

Coding-style goodness in jbd.h

13d8498a

[PATCH] JBD: fix race over access to b_committed_data · 47bb09d8

Andrew Morton authored Jun 17, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

We have a race wherein the block allocator can decide that
journal_head.b_committed_data is present and then will use it. But kjournald
can concurrently free it and set the pointer to NULL. It goes oops.

We introduce per-buffer_head "spinlocking" based on a bit in b_state. To do
this we abstract out pte_chain_lock() and reuse the implementation.

The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning
in there) and we may move this to a hashed spinlock later.

47bb09d8

[PATCH] ext3: scalable counters and locks · 17aff938

Andrew Morton authored Jun 17, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

This is a port from ext2 of the fuzzy counters (for Orlov allocator
heuristics) and the hashed spinlocking (for the inode and bloock allocators).

17aff938

[PATCH] ext3: concurrent block/inode allocation · c12b9866

Andrew Morton authored Jun 17, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>


This patch weans ext3 off lock_super()-based protection for the inode and
block allocators.

It's basically the same as the ext2 changes.


1) each group has own spinlock, which is used for group counter
   modifications

2) sb->s_free_blocks_count isn't used any more.  ext2_statfs() and
   find_group_orlov() loop over groups to count free blocks

3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time
   in order to check consistency and to avoid fsck warnings

4) reserved blocks are distributed over last groups

5) ext3_new_block() tries to use non-reserved blocks and if it fails then
   tries to use reserved blocks

6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks,
   therefore they do not call mark_buffer_dirty() for superblock's
   buffer_head. this should reduce I/O a bit


Also fix orlov allocator boundary case:

In the interests of SMP scalability the ext2 free blocks and free inodes
counters are "approximate".  But there is a piece of code in the Orlov
allocator which fails due to boundary conditions on really small
filesystems.

Fix that up via a final allocation pass which simply uses first-fit for
allocatiopn of a directory inode.

c12b9866

[PATCH] JBD: journal_get_write_access() speedup · 78f2f471
Andrew Morton authored Jun 17, 2003
```
Move some lock_kernel() calls from the caller to the callee, reducing
holdtimes.
```
78f2f471

[PATCH] ext3: move lock_kernel() down into the JBD layer. · 3307fbd1

Andrew Morton authored Jun 17, 2003

This is the start of the ext3 scalability rework.  It basically comes in two
halves:

- ext3 BKL/lock_super removal and scalable inode/block allocators

- JBD locking rework.

The ext3 scalability work was completed a couple of months ago.

The JBD rework has been stable for a couple of weeks now.  My gut feeling is
that there should be one, maybe two bugs left in it, but no problems have
been discovered...


Performance-wise, throughput is increased by up to 2x on dual CPU.  10x on
16-way has been measured.  Given that current ext3 is able to chew two whole
CPUs spinning on locks on a 4-way, that wasn't especially suprising.

These patches were prepared by Alex Tomas <bzzz@tmi.comex.ru> and myself.


First patch: ext3 lock_kernel() removal.

The only reason why ext3 takes lock_kernel() is because it is requires by the
JBD API.

The patch removes the lock_kernels() from ext3 and pushes them down into JBD
itself.

3307fbd1

Merge http://lia64.bkbits.net/to-linus-2.5 · 0d0d8534
Linus Torvalds authored Jun 17, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
0d0d8534

17 Jun, 2003 1 commit
- ia64: Initial sync with 2.5.72. · 1626bd5b
  David Mosberger authored Jun 17, 2003
  
  1626bd5b