1. 14 May, 2004 40 commits
    • Andrew Morton's avatar
      [PATCH] ia64 cpu hotplug: core kernel initialisation · 8fe08444
      Andrew Morton authored
      From: Ashok Raj <ashok.raj@intel.com>
      
      This patch changes __init to __devinit to init_idle so that when a new cpu
      arrives, it can call these functions at a later time.
      8fe08444
    • Andrew Morton's avatar
      [PATCH] swap speedups and fix · 2e27bd98
      Andrew Morton authored
      From: Andrea Arcangeli <andrea@suse.de>
      
      I don't think we need an install_swap_bdev/remove_swap_bdev anymore, we should
      use the swap_info->bdev, not the swap_bdevs.  the swap_info already has a
      ->bdev field, the only point of remove_swap_bdev/install_swap_bdev was to
      unplug all devices as efficiently as possible, we don't need that anymore with
      the page parameter.
      
      Plus the semaphore should be a rwsem to allow parallel unplug from multiple
      pages.
      
      After that I don't need to take the semaphore anymore during swapon, no
      swapcache with swp_type() pointing to such bdev, will be allowed until swapon
      is complete (SWP_ACTIVE is set a lot later after setting p->bdev).
      
      In swapoff I only need a dummy serialization with the readers, after
      try_to_unuse is complete:
      
       	err = try_to_unuse(type);
       	current->flags &= ~PF_SWAPOFF;
      
       	/* wait for any unplug function to finish */
       	down_write(&swap_unplug_sem);
       	up_write(&swap_unplug_sem);
      
      
      that's all, no other locking and no install_swap_bdev/remove_swap_bdev.
      
      (and the swap_bdevs[] compression code was busted)
      2e27bd98
    • Andrew Morton's avatar
      [PATCH] blk_run_page(): we don't trust bh->b_page · 4e36c118
      Andrew Morton authored
      We don't trust bh->b_page to point to the right thing across all filesystems,
      so revert this bit.
      4e36c118
    • Andrew Morton's avatar
      3a1e4697
    • Andrew Morton's avatar
      [PATCH] Add blk_run_page() · e059d5da
      Andrew Morton authored
      From: Andrea Arcangeli <andrea@suse.de>
      
      From: Jens Axboe
      
      Add blk_run_page() API.  This is so that we can pass the target page all the
      way down to (for example) the swap unplug function.  So swap can work out
      which blockdevs back this particular page.
      e059d5da
    • Andrew Morton's avatar
      [PATCH] rmap-5-swap_unplug-page-revert · 485ba3c3
      Andrew Morton authored
      Revert the pre-2.6.6 per-address-space unplugging changes.  This removes a
      swapper_space exceptionality, syncs things with Andrea and provides for
      simplification of the swap unplug function.
      485ba3c3
    • Andrew Morton's avatar
      [PATCH] rename rmap_lock to page_map_lock · c78a6f26
      Andrew Morton authored
      Sync this up with Andrea's patches.
      c78a6f26
    • Andrew Morton's avatar
      [PATCH] filtered wakeups: apply to buffer_head functions · 70d1f017
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      This patch implements wake-one semantics for buffer_head wakeups in a single
      step.  The buffer_head being waited on is passed to the waiter's wakeup
      function by the waker, and the wakeup function compares that to the a pointer
      stored in its on-stack structure and checking the readiness of the bit there
      also.  Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE in the
      codepaths waiting to acquire the bit for mutual exclusion.
      70d1f017
    • Andrew Morton's avatar
      [PATCH] filtered wakeups: apply to pagecache functions · 08aaf1cc
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      This patch implements wake-one semantics for page wakeups in a single step.
      Discrimination between distinct pages is achieved by passing the page to the
      wakeup function, which compares it to a pointer in its own on-stack structure
      containing the waitqueue element and the page.  Bit discrimination is achieved
      by storing the bit number in that same structure and testing the bit in the
      wakeup function.  Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE
      in the codepaths waiting to acquire the bit for mutual exclusion.
      08aaf1cc
    • Andrew Morton's avatar
      [PATCH] filtered wakeups: wakeup enhancements · 2afafa3b
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      This patch provides an additional argument to __wake_up_common() so that the
      information wakefunc.patch made waiters ready to receive may be passed to them
      by wakers.  This is provided as a separate patch so that the overhead of the
      additional argument to __wake_up_common() can be measured in isolation.  No
      change in performance was observable here.
      2afafa3b
    • Andrew Morton's avatar
      [PATCH] filtered wakeups · 2f242854
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      This patch series is solving the "thundering herd" problem that occurs in the
      mainline implementation of hashed waitqueues.  There are two sources of
      spurious wakeups in such arrangements:
      
      (a) Hash collisions that place waiters on different objects on the same
          waitqueue, which wakes threads falsely when any of the objects hashed to
          the same queue receives a wakeup.  i.e.  loss of information about which
          object a wakeup event is related to.
      
      (b) Loss of information about which object a given waiter is waiting on.
          This precludes wake-one semantics for mutual exclusion scenarios.  For
          instance, a lock bit may be slept on.  If there are any waiters on the
          object, a lock bit release event must wake at least one of them so as to
          prevent deadlock.  But without information as to which waiter is waiting
          on which object, we must resort to waking all waiters who could possibly
          be waiting on it.  Now, as the lock bit provides mutual exclusion, only
          one of the waiters woken can proceed, and the remainder will go back to
          sleep and wait for another event, creating unnecessary system load.  Once
          wake-one semantics are established, only one of the waiters waiting to
          acquire a lock bit need to be woken, which measurably reduces system load
          and improves efficiency (i.e.  it's the subject of the benchmarking I've
          been sending to you).
      
      Even beyond the measurable efficiency gains, there are reasons of robustness
      and responsiveness to motivate addressing the issue of thundering herds.  In a
      real-life scenario I've been personally involved in resolving, the thundering
      herd issue caused powerful modern SMP machines with fast IO systems to be
      unresponsive to user input for a minute at a time or more.  Analogues of these
      patches for the distro kernels involved fully resolved the issue to the
      customer's satisfaction and obviated workarounds to limit the pagecache's
      size.
      
      The latest spin of these patches basically shoves more pieces of the logic
      into the wakeup functions, with some efficiency gains from sharing the hot
      codepath with the rest of the kernel, and a slightly larger diff than the
      patches with the newly-introduced entrypoint.  Writing these was motivated by
      the push to insulate sched.c from more of the details of wakeup semantics by
      putting more of the logic into the wakeup functions.  In order to accomplish
      this while still solving (b), the wakeup functions grew a new argument for
      communication about what object a wakeup event is related to to be passed by
      the waker.
      
      =========
      
      This patch provides an additional argument to wakeup functions so that
      information may be passed from the waker to the waiter.  This is provided as a
      separate patch so that the overhead of the additional argument can be measured
      in isolation.  No change in performance was observable here.
      2f242854
    • Andrew Morton's avatar
      [PATCH] do_mounts_rd-malloc-fix · 5a930dd9
      Andrew Morton authored
      gcc-3.4.0 sez:
      
      init/do_mounts_rd.c:309: warning: conflicting types for built-in function 'malloc'
      5a930dd9
    • Andrew Morton's avatar
      [PATCH] VM accounting fix · e46bdb8d
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Stas Sergeev <stsp@aknet.ru> wrote:
      
         mprotect() fails to merge VMAs because one VMA can end up with
         VM_ACCOUNT flag set, and another without that flag.  That makes several
         apps of mine to malfuncate.
      
      
      Great find!  Someone has got their test the wrong way round.  Since that
      VM_MAYACCT macro is being used in one place only, and just hiding what it's
      actually about, fold it into its callsite.
      e46bdb8d
    • Andrew Morton's avatar
      [PATCH] revert the process-migration-speedup patch · 64525acc
      Andrew Morton authored
      David Mosberger asked that this be backed out:
      
      "I do not believe that flushing the TLB before migration is be the right thing
      to do on ia64 machines which support global TLB purges (i.e., all but SGI's
      machines)."
      
      It was of huge benefit for the SGI machines, so work is ongoing.
      64525acc
    • Andrew Morton's avatar
      [PATCH] MSEC_TO_JIFFIES to msec_to_jiffies · 5975a1db
      Andrew Morton authored
      Switch all users of MSEC[S]_TO_JIFFIES and JIFFIES_TO_MSEC[S] over to use
      jiffies_to_msecs() and msecs_to_jiffies().  Withdraw MSECS_TO_JIFFIES() and
      JIFFIES_TO_MSECS() from the kernel API.
      5975a1db
    • Andrew Morton's avatar
      [PATCH] Covert drivers to use msec_to_jiffies · b3dafee7
      Andrew Morton authored
      Remove various private implementations of msecs_to_jiffies() and
      jiffies_to_msecs().
      
      There are various uppercase versions which should be consolidated.
      b3dafee7
    • Andrew Morton's avatar
      [PATCH] MSEC_TO_JIFFIES consolidation · 5b59eadf
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      We have various different implementations of MSEC[S]_TO_JIFFIES and
      JIFFIES_TO_MSEC[S].  We recently had a compile-time clash in USB.
      
      Fix all that up.
      
      - The SCTP version was very inefficient.  Hopefully this version is accurate
        enough.
      
      - Optimise for the HZ=100 and HZ=1000 cases
      
      - This version does round-up, so sleep(9 milliseconds) works OK on 100HZ.
      
      - We still have lots of jiffies_to_msec and msec_to_jiffies implementations.
      
      From: William Lee Irwin III <wli@holomorphy.com>
      
        Optimize the cases where HZ is a divisor of 1000 or vice-versa in
        JIFFIES_TO_MSECS() and MSECS_TO_JIFFIES() by allowing the nonvanishing(!)
        integral ratios to appear as a parenthesized expressions eligible for
        constant folding optimizations.
      
      From: me
      
        Use typesafe inlines for the jiffies-to-millisecond conversion functions.
      
        This means that milliseconds officially takes the type `unsigned int'.
        All current callers seem to be OK with that.
      
        Drivers need to be fixed up to use this instead of their private versions.
      5b59eadf
    • Andrew Morton's avatar
      [PATCH] sched: add missing local_irq_enable() · 1b104df1
      Andrew Morton authored
      From: Nick Piggin <nickpiggin@yahoo.com.au>
      
      this_rq_lock does a local_irq_disable, and sched_yield() needs to undo that.
      1b104df1
    • Andrew Morton's avatar
      [PATCH] Fix page double-freeing race · f68e7a55
      Andrew Morton authored
      This has been there for nearly two years.  See bugzilla #1403
      
      vmscan.c does, in two places:
      
      	spin_lock(zone->lru_lock)
      	page = lru_to_page(&zone->inactive_list);
      	if (page_count(page) == 0) {
      		/* erk, it's being freed by __page_cache_release() or
      		 * release_pages()
      		 */
      		put_it_back_on_the_lru();
      
      	} else {
      
      	--> window 1 <--
      
      		page_cache_get(page);
      		put_in_on_private_list();
      	}
      	spin_unlock(zone->lru_lock)
      
      	use_the_private_list();
      
      	page_cache_release(page);
      
      
      
      whereas __page_cache_release() and release_pages() do:
      
      	if (put_page_testzero(page)) {
      
      	--> window 2 <--
      
      		spin_lock(lru->lock);
      		if (page_count(page) == 0) {
      			remove_it_from_the_lru();
      			really_free_the_page()
      		}
      		spin_unlock(zone->lru_lock)
      	}
      
      
      The race occurs if the vmscan.c path sees page_count()==1 and then the
      page_cache_release() path happens in that few-instruction "window 1" before
      vmscan's page_cache_get().
      
      The page_cache_release() path does put_page_testzero(), which returns true.
      Then this CPU takes an interrupt...
      
      The vmscan.c path then does page_cache_get(), taking the refcount to one.
      Then it uses the page and does page_cache_release(), taking the refcount to
      zero and the page is really freed.
      
      Now, the CPU running page_cache_release() returns from the interrupt, takes
      the LRU lock, sees the page still has a refcount of zero and frees it again.
      Boom.
      
      
      The patch fixes this by closing "window 1".  We provide a
      "get_page_testone()" which grabs a ref on the page and returns true if the
      refcount was previously zero.  If that happens the vmscan.c code simply drops
      the page's refcount again and leaves the page on the LRU.
      
      All this happens under the zone->lru_lock, which is also taken by
      __page_cache_release() and release_pages(), so the vmscan code knows that the
      page has not been returned to the page allocator yet.
      
      
      In terms of implementation, the page counts are now offset by one: a free page
      has page->_count of -1.  This is so that we can use atomic_add_negative() and
      atomic_inc_and_test() to provide put_page_testzero() and get_page_testone().
      
      The macros hide all of this so the public interpretation of page_count() and
      set_page_count() remains unaltered.
      
      The compiler can usually constant-fold the offsetting of page->count.  This
      patch increases an x86 SMP kernel's text by 32 bytes.
      
      The patch renames page->count to page->_count to break callers who aren't
      using the macros.
      
      This patch requires that the architecture implement atomic_add_negative().  It
      is currently present on
      
      	arm
      	arm26
      	i386
      	ia64
      	mips
      	ppc
      	s390
      	v850
      	x86_64
      
      ppc implements this as
      
      #define atomic_add_negative(a, v)	(atomic_add_return((a), (v)) < 0)
      
      and atomic_add_return() is implemented on
      
      	alpha
      	cris
      	h8300
      	ia64
      	m68knommu
      	mips
      	parisc
      	ppc
      	ppc
      	ppc64
      	s390
      	sh
      	sparc
      	v850
      
      so we're looking pretty good.
      f68e7a55
    • Andrew Morton's avatar
      2a43abd3
    • Andrew Morton's avatar
      [PATCH] ia64 atomic_inc_and_test fix · bcd599c6
      Andrew Morton authored
      From: David Mosberger <davidm@napali.hpl.hp.com>
      bcd599c6
    • Andrew Morton's avatar
      [PATCH] alpha: atomic_inc_and_test() · f3b6590c
      Andrew Morton authored
      From: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      
      It seems atomic_inc_and_test() is missing on alpha.
      f3b6590c
    • Andrew Morton's avatar
      [PATCH] Implement atomic_inc_and_test() on various architectures · 52d38af5
      Andrew Morton authored
      It's easy to do when the arch provides atomic_inc_return().
      52d38af5
    • Andrew Morton's avatar
      [PATCH] Implement atomic_add_negative() on various architectures · ebc7bc42
      Andrew Morton authored
      Lots of architectures have atomic_add_return() and no atomic_add_negative().
      
      We can implement the latter in terms of the former.
      ebc7bc42
    • Andrew Morton's avatar
      [PATCH] Make users of page->count use the provided macros · b5fc1438
      Andrew Morton authored
      I'm about to change the meaning (and name) of page->count.  Go through and fix
      up all those places which are open-coding references to it.
      b5fc1438
    • Jeff Garzik's avatar
      Merge redhat.com:/spare/repo/linux-2.6 · 21adfc14
      Jeff Garzik authored
      into redhat.com:/spare/repo/net-drivers-2.6
      21adfc14
    • Linus Torvalds's avatar
      Merge bk://kernel.bkbits.net/gregkh/linux/i2c-2.6 · 0cd6f6c1
      Linus Torvalds authored
      into ppc970.osdl.org:/home/torvalds/v2.6/linux
      0cd6f6c1
    • Linus Torvalds's avatar
      Merge bk://kernel.bkbits.net/gregkh/linux/driver-2.6 · a0d4e51c
      Linus Torvalds authored
      into ppc970.osdl.org:/home/torvalds/v2.6/linux
      a0d4e51c
    • Linus Torvalds's avatar
      Merge bk://kernel.bkbits.net/gregkh/linux/usb-2.6 · be9bc8d1
      Linus Torvalds authored
      into ppc970.osdl.org:/home/torvalds/v2.6/linux
      be9bc8d1
    • Greg Kroah-Hartman's avatar
      Merge kroah.com:/home/greg/linux/BK/bleed-2.6 · 3004c70f
      Greg Kroah-Hartman authored
      into kroah.com:/home/greg/linux/BK/driver-2.6
      3004c70f
    • Greg Kroah-Hartman's avatar
      Merge kroah.com:/home/greg/linux/BK/bleed-2.6 · 744804bc
      Greg Kroah-Hartman authored
      into kroah.com:/home/greg/linux/BK/i2c-2.6
      744804bc
    • Deepak Saxena's avatar
      [PATCH] I2C: Missed ixp42x -> ixp4xx conversion · 558fcd72
      Deepak Saxena authored
      Forgot to include this with my original patch a few weeks ago...
      558fcd72
    • Bjørn Mork's avatar
      [PATCH] I2C: "probe" module param broken for it87 in Linux 2.6.6 · 4b36b0cf
      Bjørn Mork authored
      Jean Delvare <khali@linux-fr.org> writes:
      > So I'd suggest that you simply use the standard exit sequence in the
      > it87 driver (the second one in your current patch). A patch for the 2.4
      > driver would be appreciated as well.
      
      OK.  I've attached a new version of the patch against linux-2.6.6.
      I'll send a patch against current lm_sensors CVS removing the extra
      exit command in a separate mail.
      
      Greg KH <greg@kroah.com> writes:
      > On Wed, May 12, 2004 at 04:38:03PM +0200, Bj?rn Mork wrote:
      >> +	if (!it87_find(&addr)) {
      >> +		printk("it87.o: new ISA address: 0x%04x\n", addr);
      >
      > That printk is wrong (no KERN_ level, or dev_printk() style use).
      > Please fix it in your next revision of this patch.
      
      Errh, I just added it to document my sloppyness.  It was never meant
      to be in the patch I sent you.  Sorry.  Removed in the attached patch.
      The style of these drivers seem to be "just working, making no noise"
      so I assume informational printk's are unwanted.
      4b36b0cf
    • Jason D. Gaston's avatar
      [PATCH] I2C: ICH6/6300ESB i2c support · 5fc5be30
      Jason D. Gaston authored
      This patch adds DID support for ICH6 and 6300ESB to i2c-i801.c(SMBus).
      In order to add this support I needed to patch pci_ids.h with the SMBus
      DID's.  To keep things orginized I renumbered the ICH6 and ESB entries
      in pci_ids.h.  I then patched the piix IDE and i810 audio drivers to
      reflect the updated #define's.  I also removed an error from irq.c;
      there was a reference to a 6300ESB DID that does not exist.
      5fc5be30
    • Greg Kroah-Hartman's avatar
      8382d1fb
    • Greg Kroah-Hartman's avatar
      f6442a84
    • Greg Kroah-Hartman's avatar
      02eb8c10
    • Linus Torvalds's avatar
      Merge bk://linux-acpi.bkbits.net/linux-acpi-release-2.6.6 · 5458096c
      Linus Torvalds authored
      into ppc970.osdl.org:/home/torvalds/v2.6/linux
      5458096c
    • David Brownell's avatar
      [PATCH] USB: hcd-pci suspend tweak · e044323a
      David Brownell authored
      I needed this to get an APM + UHCI config to behave on resume.
      Applies against your BK of last night ... OHCI and EHCI do
      some of this manually, they could be simplified later.
      e044323a
    • Maneesh Soni's avatar
      [PATCH] sysfs_rename_dir-cleanup · 51c0c34c
      Maneesh Soni authored
      o The following patch cleans up sysfs_rename_dir(). It now checks the
        return code of kobject_set_name() and propagates the error code to its
        callers. Because of this there are changes in the following two APIs. Both
        return int instead of void.
      
      int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
      int kobject_rename(struct kobject * kobj, char *new_name)
      51c0c34c