Commits · 2e27bd98812fd2f07999574564bd8ce7ddfb4b9e · Kirill Smelkov / linux

14 May, 2004 40 commits

[PATCH] swap speedups and fix · 2e27bd98

Andrew Morton authored May 14, 2004

From: Andrea Arcangeli <andrea@suse.de>

I don't think we need an install_swap_bdev/remove_swap_bdev anymore, we should
use the swap_info->bdev, not the swap_bdevs.  the swap_info already has a
->bdev field, the only point of remove_swap_bdev/install_swap_bdev was to
unplug all devices as efficiently as possible, we don't need that anymore with
the page parameter.

Plus the semaphore should be a rwsem to allow parallel unplug from multiple
pages.

After that I don't need to take the semaphore anymore during swapon, no
swapcache with swp_type() pointing to such bdev, will be allowed until swapon
is complete (SWP_ACTIVE is set a lot later after setting p->bdev).

In swapoff I only need a dummy serialization with the readers, after
try_to_unuse is complete:

 	err = try_to_unuse(type);
 	current->flags &= ~PF_SWAPOFF;

 	/* wait for any unplug function to finish */
 	down_write(&swap_unplug_sem);
 	up_write(&swap_unplug_sem);


that's all, no other locking and no install_swap_bdev/remove_swap_bdev.

(and the swap_bdevs[] compression code was busted)

2e27bd98

[PATCH] blk_run_page(): we don't trust bh->b_page · 4e36c118
Andrew Morton authored May 14, 2004
```
We don't trust bh->b_page to point to the right thing across all filesystems,
so revert this bit.
```
4e36c118
[PATCH] blk_run_page(): fixup for swap_unplug_io_fn() · 3a1e4697
Andrew Morton authored May 14, 2004

3a1e4697

[PATCH] Add blk_run_page() · e059d5da

Andrew Morton authored May 14, 2004

From: Andrea Arcangeli <andrea@suse.de>

From: Jens Axboe

Add blk_run_page() API.  This is so that we can pass the target page all the
way down to (for example) the swap unplug function.  So swap can work out
which blockdevs back this particular page.

e059d5da

[PATCH] rmap-5-swap_unplug-page-revert · 485ba3c3

Andrew Morton authored May 14, 2004

Revert the pre-2.6.6 per-address-space unplugging changes. This removes a
swapper_space exceptionality, syncs things with Andrea and provides for
simplification of the swap unplug function.

485ba3c3

[PATCH] rename rmap_lock to page_map_lock · c78a6f26
Andrew Morton authored May 14, 2004
```
Sync this up with Andrea's patches.
```
c78a6f26

[PATCH] filtered wakeups: apply to buffer_head functions · 70d1f017

Andrew Morton authored May 14, 2004

From: William Lee Irwin III <wli@holomorphy.com>

This patch implements wake-one semantics for buffer_head wakeups in a single
step. The buffer_head being waited on is passed to the waiter's wakeup
function by the waker, and the wakeup function compares that to the a pointer
stored in its on-stack structure and checking the readiness of the bit there
also. Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE in the
codepaths waiting to acquire the bit for mutual exclusion.

70d1f017

[PATCH] filtered wakeups: apply to pagecache functions · 08aaf1cc

Andrew Morton authored May 14, 2004

From: William Lee Irwin III <wli@holomorphy.com>

This patch implements wake-one semantics for page wakeups in a single step.
Discrimination between distinct pages is achieved by passing the page to the
wakeup function, which compares it to a pointer in its own on-stack structure
containing the waitqueue element and the page. Bit discrimination is achieved
by storing the bit number in that same structure and testing the bit in the
wakeup function. Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE
in the codepaths waiting to acquire the bit for mutual exclusion.

08aaf1cc

[PATCH] filtered wakeups: wakeup enhancements · 2afafa3b

Andrew Morton authored May 14, 2004

From: William Lee Irwin III <wli@holomorphy.com>

This patch provides an additional argument to __wake_up_common() so that the
information wakefunc.patch made waiters ready to receive may be passed to them
by wakers. This is provided as a separate patch so that the overhead of the
additional argument to __wake_up_common() can be measured in isolation. No
change in performance was observable here.

2afafa3b

[PATCH] filtered wakeups · 2f242854

Andrew Morton authored May 14, 2004

From: William Lee Irwin III <wli@holomorphy.com>

This patch series is solving the "thundering herd" problem that occurs in the
mainline implementation of hashed waitqueues.  There are two sources of
spurious wakeups in such arrangements:

(a) Hash collisions that place waiters on different objects on the same
    waitqueue, which wakes threads falsely when any of the objects hashed to
    the same queue receives a wakeup.  i.e.  loss of information about which
    object a wakeup event is related to.

(b) Loss of information about which object a given waiter is waiting on.
    This precludes wake-one semantics for mutual exclusion scenarios.  For
    instance, a lock bit may be slept on.  If there are any waiters on the
    object, a lock bit release event must wake at least one of them so as to
    prevent deadlock.  But without information as to which waiter is waiting
    on which object, we must resort to waking all waiters who could possibly
    be waiting on it.  Now, as the lock bit provides mutual exclusion, only
    one of the waiters woken can proceed, and the remainder will go back to
    sleep and wait for another event, creating unnecessary system load.  Once
    wake-one semantics are established, only one of the waiters waiting to
    acquire a lock bit need to be woken, which measurably reduces system load
    and improves efficiency (i.e.  it's the subject of the benchmarking I've
    been sending to you).

Even beyond the measurable efficiency gains, there are reasons of robustness
and responsiveness to motivate addressing the issue of thundering herds.  In a
real-life scenario I've been personally involved in resolving, the thundering
herd issue caused powerful modern SMP machines with fast IO systems to be
unresponsive to user input for a minute at a time or more.  Analogues of these
patches for the distro kernels involved fully resolved the issue to the
customer's satisfaction and obviated workarounds to limit the pagecache's
size.

The latest spin of these patches basically shoves more pieces of the logic
into the wakeup functions, with some efficiency gains from sharing the hot
codepath with the rest of the kernel, and a slightly larger diff than the
patches with the newly-introduced entrypoint.  Writing these was motivated by
the push to insulate sched.c from more of the details of wakeup semantics by
putting more of the logic into the wakeup functions.  In order to accomplish
this while still solving (b), the wakeup functions grew a new argument for
communication about what object a wakeup event is related to to be passed by
the waker.

=========

This patch provides an additional argument to wakeup functions so that
information may be passed from the waker to the waiter.  This is provided as a
separate patch so that the overhead of the additional argument can be measured
in isolation.  No change in performance was observable here.

2f242854

[PATCH] do_mounts_rd-malloc-fix · 5a930dd9

Andrew Morton authored May 14, 2004

gcc-3.4.0 sez:

init/do_mounts_rd.c:309: warning: conflicting types for built-in function 'malloc'

5a930dd9

[PATCH] VM accounting fix · e46bdb8d

Andrew Morton authored May 14, 2004

From: Hugh Dickins <hugh@veritas.com>

Stas Sergeev <stsp@aknet.ru> wrote:

   mprotect() fails to merge VMAs because one VMA can end up with
   VM_ACCOUNT flag set, and another without that flag.  That makes several
   apps of mine to malfuncate.


Great find!  Someone has got their test the wrong way round.  Since that
VM_MAYACCT macro is being used in one place only, and just hiding what it's
actually about, fold it into its callsite.

e46bdb8d

[PATCH] revert the process-migration-speedup patch · 64525acc

Andrew Morton authored May 14, 2004

David Mosberger asked that this be backed out:

"I do not believe that flushing the TLB before migration is be the right thing
to do on ia64 machines which support global TLB purges (i.e., all but SGI's
machines)."

It was of huge benefit for the SGI machines, so work is ongoing.

64525acc

[PATCH] MSEC_TO_JIFFIES to msec_to_jiffies · 5975a1db

Andrew Morton authored May 14, 2004

Switch all users of MSEC[S]_TO_JIFFIES and JIFFIES_TO_MSEC[S] over to use
jiffies_to_msecs() and msecs_to_jiffies(). Withdraw MSECS_TO_JIFFIES() and
JIFFIES_TO_MSECS() from the kernel API.

5975a1db

[PATCH] Covert drivers to use msec_to_jiffies · b3dafee7

Andrew Morton authored May 14, 2004

Remove various private implementations of msecs_to_jiffies() and
jiffies_to_msecs().

There are various uppercase versions which should be consolidated.

b3dafee7

[PATCH] MSEC_TO_JIFFIES consolidation · 5b59eadf

Andrew Morton authored May 14, 2004

From: Ingo Molnar <mingo@elte.hu>

We have various different implementations of MSEC[S]_TO_JIFFIES and
JIFFIES_TO_MSEC[S].  We recently had a compile-time clash in USB.

Fix all that up.

- The SCTP version was very inefficient.  Hopefully this version is accurate
  enough.

- Optimise for the HZ=100 and HZ=1000 cases

- This version does round-up, so sleep(9 milliseconds) works OK on 100HZ.

- We still have lots of jiffies_to_msec and msec_to_jiffies implementations.

From: William Lee Irwin III <wli@holomorphy.com>

  Optimize the cases where HZ is a divisor of 1000 or vice-versa in
  JIFFIES_TO_MSECS() and MSECS_TO_JIFFIES() by allowing the nonvanishing(!)
  integral ratios to appear as a parenthesized expressions eligible for
  constant folding optimizations.

From: me

  Use typesafe inlines for the jiffies-to-millisecond conversion functions.

  This means that milliseconds officially takes the type `unsigned int'.
  All current callers seem to be OK with that.

  Drivers need to be fixed up to use this instead of their private versions.

5b59eadf

[PATCH] sched: add missing local_irq_enable() · 1b104df1

Andrew Morton authored May 14, 2004

From: Nick Piggin <nickpiggin@yahoo.com.au>

this_rq_lock does a local_irq_disable, and sched_yield() needs to undo that.

1b104df1

[PATCH] Fix page double-freeing race · f68e7a55

Andrew Morton authored May 14, 2004

This has been there for nearly two years.  See bugzilla #1403

vmscan.c does, in two places:

	spin_lock(zone->lru_lock)
	page = lru_to_page(&zone->inactive_list);
	if (page_count(page) == 0) {
		/* erk, it's being freed by __page_cache_release() or
		 * release_pages()
		 */
		put_it_back_on_the_lru();

	} else {

	--> window 1 <--

		page_cache_get(page);
		put_in_on_private_list();
	}
	spin_unlock(zone->lru_lock)

	use_the_private_list();

	page_cache_release(page);



whereas __page_cache_release() and release_pages() do:

	if (put_page_testzero(page)) {

	--> window 2 <--

		spin_lock(lru->lock);
		if (page_count(page) == 0) {
			remove_it_from_the_lru();
			really_free_the_page()
		}
		spin_unlock(zone->lru_lock)
	}


The race occurs if the vmscan.c path sees page_count()==1 and then the
page_cache_release() path happens in that few-instruction "window 1" before
vmscan's page_cache_get().

The page_cache_release() path does put_page_testzero(), which returns true.
Then this CPU takes an interrupt...

The vmscan.c path then does page_cache_get(), taking the refcount to one.
Then it uses the page and does page_cache_release(), taking the refcount to
zero and the page is really freed.

Now, the CPU running page_cache_release() returns from the interrupt, takes
the LRU lock, sees the page still has a refcount of zero and frees it again.
Boom.


The patch fixes this by closing "window 1".  We provide a
"get_page_testone()" which grabs a ref on the page and returns true if the
refcount was previously zero.  If that happens the vmscan.c code simply drops
the page's refcount again and leaves the page on the LRU.

All this happens under the zone->lru_lock, which is also taken by
__page_cache_release() and release_pages(), so the vmscan code knows that the
page has not been returned to the page allocator yet.


In terms of implementation, the page counts are now offset by one: a free page
has page->_count of -1.  This is so that we can use atomic_add_negative() and
atomic_inc_and_test() to provide put_page_testzero() and get_page_testone().

The macros hide all of this so the public interpretation of page_count() and
set_page_count() remains unaltered.

The compiler can usually constant-fold the offsetting of page->count.  This
patch increases an x86 SMP kernel's text by 32 bytes.

The patch renames page->count to page->_count to break callers who aren't
using the macros.

This patch requires that the architecture implement atomic_add_negative().  It
is currently present on

	arm
	arm26
	i386
	ia64
	mips
	ppc
	s390
	v850
	x86_64

ppc implements this as

#define atomic_add_negative(a, v)	(atomic_add_return((a), (v)) < 0)

and atomic_add_return() is implemented on

	alpha
	cris
	h8300
	ia64
	m68knommu
	mips
	parisc
	ppc
	ppc
	ppc64
	s390
	sh
	sparc
	v850

so we're looking pretty good.

f68e7a55

[PATCH] sparc64: implement atomic_add_negative() · 2a43abd3
Andrew Morton authored May 14, 2004

2a43abd3
[PATCH] ia64 atomic_inc_and_test fix · bcd599c6
Andrew Morton authored May 14, 2004
```
From: David Mosberger <davidm@napali.hpl.hp.com>
```
bcd599c6

[PATCH] alpha: atomic_inc_and_test() · f3b6590c

Andrew Morton authored May 14, 2004

From: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

It seems atomic_inc_and_test() is missing on alpha.

f3b6590c

[PATCH] Implement atomic_inc_and_test() on various architectures · 52d38af5
Andrew Morton authored May 14, 2004
```
It's easy to do when the arch provides atomic_inc_return().
```
52d38af5

[PATCH] Implement atomic_add_negative() on various architectures · ebc7bc42

Andrew Morton authored May 14, 2004

Lots of architectures have atomic_add_return() and no atomic_add_negative().

We can implement the latter in terms of the former.

ebc7bc42

[PATCH] Make users of page->count use the provided macros · b5fc1438

Andrew Morton authored May 14, 2004

I'm about to change the meaning (and name) of page->count.  Go through and fix
up all those places which are open-coding references to it.

b5fc1438

Merge redhat.com:/spare/repo/linux-2.6 · 21adfc14
Jeff Garzik authored May 14, 2004
```
into redhat.com:/spare/repo/net-drivers-2.6
```
21adfc14
Merge bk://kernel.bkbits.net/gregkh/linux/i2c-2.6 · 0cd6f6c1
Linus Torvalds authored May 14, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
0cd6f6c1
Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 · a0d4e51c
Linus Torvalds authored May 14, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
a0d4e51c
Merge bk://kernel.bkbits.net/gregkh/linux/usb-2.6 · be9bc8d1
Linus Torvalds authored May 14, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
be9bc8d1
Merge kroah.com:/home/greg/linux/BK/bleed-2.6 · 3004c70f
Greg Kroah-Hartman authored May 14, 2004
```
into kroah.com:/home/greg/linux/BK/driver-2.6
```
3004c70f
Merge kroah.com:/home/greg/linux/BK/bleed-2.6 · 744804bc
Greg Kroah-Hartman authored May 14, 2004
```
into kroah.com:/home/greg/linux/BK/i2c-2.6
```
744804bc
[PATCH] I2C: Missed ixp42x -> ixp4xx conversion · 558fcd72
Deepak Saxena authored May 14, 2004
```
Forgot to include this with my original patch a few weeks ago...
```
558fcd72

[PATCH] I2C: "probe" module param broken for it87 in Linux 2.6.6 · 4b36b0cf

Bjørn Mork authored May 14, 2004

Jean Delvare <khali@linux-fr.org> writes:
> So I'd suggest that you simply use the standard exit sequence in the
> it87 driver (the second one in your current patch). A patch for the 2.4
> driver would be appreciated as well.

OK.  I've attached a new version of the patch against linux-2.6.6.
I'll send a patch against current lm_sensors CVS removing the extra
exit command in a separate mail.

Greg KH <greg@kroah.com> writes:
> On Wed, May 12, 2004 at 04:38:03PM +0200, Bj?rn Mork wrote:
>> +	if (!it87_find(&addr)) {
>> +		printk("it87.o: new ISA address: 0x%04x\n", addr);
>
> That printk is wrong (no KERN_ level, or dev_printk() style use).
> Please fix it in your next revision of this patch.

Errh, I just added it to document my sloppyness.  It was never meant
to be in the patch I sent you.  Sorry.  Removed in the attached patch.
The style of these drivers seem to be "just working, making no noise"
so I assume informational printk's are unwanted.

4b36b0cf

[PATCH] I2C: ICH6/6300ESB i2c support · 5fc5be30

Jason D. Gaston authored May 14, 2004

This patch adds DID support for ICH6 and 6300ESB to i2c-i801.c(SMBus).
In order to add this support I needed to patch pci_ids.h with the SMBus
DID's. To keep things orginized I renumbered the ICH6 and ESB entries
in pci_ids.h. I then patched the piix IDE and i810 audio drivers to
reflect the updated #define's. I also removed an error from irq.c;
there was a reference to a 6300ESB DID that does not exist.

5fc5be30

USB: convert visor to use module_param() · 8382d1fb
Greg Kroah-Hartman authored May 14, 2004

8382d1fb
USB: convert pl2303 to use module_param() · f6442a84
Greg Kroah-Hartman authored May 14, 2004

f6442a84
USB: change usbserial core to use module_param() · 02eb8c10
Greg Kroah-Hartman authored May 14, 2004

02eb8c10
Merge bk://linux-acpi.bkbits.net/linux-acpi-release-2.6.6 · 5458096c
Linus Torvalds authored May 14, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
5458096c

[PATCH] USB: hcd-pci suspend tweak · e044323a

David Brownell authored May 13, 2004

I needed this to get an APM + UHCI config to behave on resume.
Applies against your BK of last night ... OHCI and EHCI do
some of this manually, they could be simplified later.

e044323a

[PATCH] sysfs_rename_dir-cleanup · 51c0c34c

Maneesh Soni authored May 13, 2004

o The following patch cleans up sysfs_rename_dir(). It now checks the
return code of kobject_set_name() and propagates the error code to its
callers. Because of this there are changes in the following two APIs. Both
return int instead of void.

int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
int kobject_rename(struct kobject * kobj, char *new_name)

51c0c34c

[PATCH] add ibmasm driver warning message · f00beb55
Max Asbock authored May 13, 2004
```
[note, I changed this a bit to be nicer on the system log, greg k-h]
```
f00beb55