1. 18 Jun, 2007 5 commits
    • Linus Torvalds's avatar
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds authored
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • Ingo Molnar's avatar
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar authored
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: default avatarGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 4cc21505
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
        [SCSI] ESP: Don't forget to clear ESP_FLAG_RESETTING.
        [SCSI] fusion: fix for BZ 8426 - massive slowdown on SCSI CD/DVD drive
      4cc21505
    • Benjamin Herrenschmidt's avatar
      Fix signalfd interaction with thread-private signals · caec4e8d
      Benjamin Herrenschmidt authored
      Don't let signalfd dequeue private signals off other threads (in the
      case of things like SIGILL or SIGSEGV, trying to do so would result
      in undefined behaviour on who actually gets the signal, since they
      are force unblocked).
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caec4e8d
    • Thomas Gleixner's avatar
      Revert "futex_requeue_pi optimization" · bd197234
      Thomas Gleixner authored
      This reverts commit d0aa7a70.
      
      It not only introduced user space visible changes to the futex syscall,
      it is also non-functional and there is no way to fix it proper before
      the 2.6.22 release.
      
      The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
      unanswered, and unfortunately it turned out that the concept is not
      feasible at all.  It violates the rtmutex semantics badly by introducing
      a virtual owner, which hacks around the coupling of the user-space
      pi_futex and the kernel internal rt_mutex representation.
      
      At the moment the only safe option is to remove it fully as it contains
      user-space visible changes to broken kernel code, which we do not want
      to expose in the 2.6.22 release.
      
      The patch reverts the original patch mostly 1:1, but contains a couple
      of trivial manual cleanups which were necessary due to patches, which
      touched the same area of code later.
      
      Verified against the glibc tests and my own PI futex tests.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Cc: Pierre Peiffer <pierre.peiffer@bull.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd197234
  2. 17 Jun, 2007 1 commit
  3. 16 Jun, 2007 27 commits
  4. 15 Jun, 2007 7 commits
    • Paul Mundt's avatar
      mm: Fix memory/cpu hotplug section mismatch and oops. · d09c6b80
      Paul Mundt authored
      When building with memory hotplug enabled and cpu hotplug disabled, we
      end up with the following section mismatch:
      
      WARNING: mm/built-in.o(.text+0x4e58): Section mismatch: reference to
      .init.text: (between 'free_area_init_node' and '__build_all_zonelists')
      
      This happens as a result of:
      
              -> free_area_init_node()
                -> free_area_init_core()
                  -> zone_pcp_init() <-- all __meminit up to this point
                    -> zone_batchsize() <-- marked as __cpuinit                     fo
      
      This happens because CONFIG_HOTPLUG_CPU=n sets __cpuinit to __init, but
      CONFIG_MEMORY_HOTPLUG=y unsets __meminit.
      
      Changing zone_batchsize() to __devinit fixes this.
      
      __devinit is the only thing that is common between CONFIG_HOTPLUG_CPU=y and
      CONFIG_MEMORY_HOTPLUG=y. In the long run, perhaps this should be moved to
      another section identifier completely. Without this, memory hot-add
      of offline nodes (via hotadd_new_pgdat()) will oops if CPU hotplug is
      not also enabled.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Acked-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      --
      
       mm/page_alloc.c |    2 +-
       1 file changed, 1 insertion(+), 1 deletion(-)
      d09c6b80
    • Linus Torvalds's avatar
      Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6 · 902233ee
      Linus Torvalds authored
      * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (30 commits)
        Blackfin SMC91X ethernet supporting driver: SMC91C111 LEDs are note drived in the kernel like in uboot
        Blackfin SPI driver: fix bug SPI DMA incomplete transmission
        Blackfin SPI driver: tweak spi cleanup function to match newer kernel changes
        Blackfin RTC drivers: update MAINTAINERS information
        Blackfin serial driver: decouple PARODD and CMSPAR checking from PARENB
        Blackfin serial driver: actually implement the break_ctl() function
        Blackfin serial driver: ignore framing and parity errors
        Blackfin serial driver: hook up our UARTs STP bit with userspaces CMSPAR
        Blackfin arch: move HI/LO macros into blackfin.h and punt the rest of macros.h as it includes VDSP macros we never use
        Blackfin arch: redo our linker script a bit
        Blackfin arch: make sure we initialize our L1 Data B section properly based on the linked kernel
        Blackfin arch: fix bug can not wakeup from sleep via push buttons
        Blackfin arch: add support for Alon Bar-Lev's dynamic kernel command-line
        Blackfin arch: add missing gpio.h header to fix compiling in some pm configurations
        Blackfin arch: As Mike pointed out range goes form m..MAX_BLACKFIN_GPIO -1
        Blackfin arch: fix spelling typo in output
        Blackfin arch: try to split up functions like this into smaller units according to LKML review
        Blackfin arch: add proper ENDPROC()
        Blackfin arch: move more of our startup code to .init so it can be freed once we are up and running
        Blackfin arch: unify differences between our diff head.S files -- no functional changes
        ...
      902233ee
    • Linus Torvalds's avatar
      Merge branch 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block · e871e3c2
      Linus Torvalds authored
      * 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
        splice: only check do_wakeup in splice_to_pipe() for a real pipe
        splice: fix leak of pages on short splice to pipe
        splice: adjust balance_dirty_pages_ratelimited() call
      e871e3c2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm · 3ea88d67
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
        KVM: Prevent guest fpu state from leaking into the host
      3ea88d67
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 4ff4275b
      Linus Torvalds authored
      * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
        [MIPS] Fix builds where MSC01E_xxx is undefined.
        [MIPS] Separate performance counter interrupts
        [MIPS] Malta: Fix for SOCitSC based Maltas
      4ff4275b
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32 · e00eea42
      Linus Torvalds authored
      * 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
        [AVR32] Define ARCH_KMALLOC_MINALIGN to L1_CACHE_BYTES
        [AVR32] STK1000: Set SPI_MODE_3 in the ltv350qv board info
        [AVR32] gpio_*_cansleep() fix
        [AVR32] ratelimit segfault reporting rate
      e00eea42
    • Tejun Heo's avatar
      block: always requeue !fs requests at the front · bc90ba09
      Tejun Heo authored
      SCSI marks internal commands with REQ_PREEMPT and push it at the front
      of the request queue using blk_execute_rq().  When entering suspended
      or frozen state, SCSI devices are quiesced using
      scsi_device_quiesce().  In quiesced state, only REQ_PREEMPT requests
      are processed.  This is how SCSI blocks other requests out while
      suspending and resuming.  As all internal commands are pushed at the
      front of the queue, this usually works.
      
      Unfortunately, this interacts badly with ordered requeueing.  To
      preserve request order on requeueing (due to busy device, active EH or
      other failures), requests are sorted according to ordered sequence on
      requeue if IO barrier is in progress.
      
      The following sequence deadlocks.
      
      1. IO barrier sequence issues.
      
      2. Suspend requested.  Queue is quiesced with part or all of IO
         barrier sequence at the front.
      
      3. During suspending or resuming, SCSI issues internal command which
         gets deferred and requeued for some reason.  As the command is
         issued after the IO barrier in #1, ordered requeueing code puts the
         request after IO barrier sequence.
      
      4. The device is ready to process requests again but still is in
         quiesced state and the first request of the queue isn't
         REQ_PREEMPT, so command processing is deadlocked -
         suspending/resuming waits for the issued request to complete while
         the request can't be processed till device is put back into
         running state by resuming.
      
      This can be fixed by always putting !fs requests at the front when
      requeueing.
      
      The following thread reports this deadlock.
      
        http://thread.gmane.org/gmane.linux.kernel/537473Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Acked-by: default avatarDavid Greaves <david@dgreaves.com>
      Acked-by: default avatarJeff Garzik <jeff@garzik.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc90ba09