1. 20 Jun, 2007 20 commits
    • Linas Vepstas's avatar
      spidernet: Cure RX ram full bug · 4c4bd5a9
      Linas Vepstas authored
      This patch fixes a rare deadlock that can occur when the kernel
      is not able to empty out the RX ring quickly enough. Below follows
      a detailed description of the bug and the fix.
      
      As long as the OS can empty out the RX buffers at a rate faster than
      the hardware can fill them, there is no problem. If, for some reason,
      the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
      pointer will catch up to the head, notice the not-empty condition,
      ad stop. However, RX packets may still continue arriving on the wire.
      The spidernet chip can save some limited number of these in local RAM.
      When this local ram fills up, the spider chip will issue an interrupt
      indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
      will be set in GHIINT1STS).  When te RX ram full condition occurs,
      a certain bug/feature is triggered that has to be specially handled.
      This section describes the special handling for this condition.
      
      When the OS finally has a chance to run, it will empty out the RX ring.
      In particular, it will clear the descriptor on which the hardware had
      stopped. However, once the hardware has decided that a certain
      descriptor is invalid, it will not restart at that descriptor; instead
      it will restart at the next descr. This potentially will lead to a
      deadlock condition, as the tail pointer will be pointing at this descr,
      which, from the OS point of view, is empty; the OS will be waiting for
      this descr to be filled. However, the hardware has skipped this descr,
      and is filling the next descrs. Since the OS doesn't see this, there
      is a potential deadlock, with the OS waiting for one descr to fill,
      while the hardware is waiting for a differen set of descrs to become
      empty.
      
      A call to show_rx_chain() at this point indicates the nature of the
      problem. A typical print when the network is hung shows the following:
      
      net eth1: Spider RX RAM full, incoming packets might be discarded!
      net eth1: Total number of descrs=256
      net eth1: Chain tail located at descr=255
      net eth1: Chain head is at 255
      net eth1: HW curr desc (GDACTDPA) is at 0
      net eth1: Have 1 descrs with stat=xa0800000
      net eth1: HW next desc (GDACNEXTDA) is at 1
      net eth1: Have 127 descrs with stat=x40800101
      net eth1: Have 1 descrs with stat=x40800001
      net eth1: Have 126 descrs with stat=x40800101
      net eth1: Last 1 descrs with stat=xa0800000
      
      Both the tail and head pointers are pointing at descr 255, which is
      marked xa... which is "empty". Thus, from the OS point of view, there
      is nothing to be done. In particular, there is the implicit assumption
      that everything in front of the "empty" descr must surely also be empty,
      as explained in the last section. The OS is waiting for descr 255 to
      become non-empty, which, in this case, will never happen.
      
      The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
      Since its already full, the hardware can do nothing more, and thus has
      halted processing. Notice that descrs 0 through 254 are all marked
      "full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
      descr 254, since tail was at 255.) Thus, the system is deadlocked,
      and there can be no forward progress; the OS thinks there's nothing
      to do, and the hardware has nowhere to put incoming data.
      
      This bug/feature is worked around with the spider_net_resync_head_ptr()
      routine. When the driver receives RX interrupts, but an examination
      of the RX chain seems to show it is empty, then it is probable that
      the hardware has skipped a descr or two (sometimes dozens under heavy
      network conditions). The spider_net_resync_head_ptr() subroutine will
      search the ring for the next full descr, and the driver will resume
      operations there.  Since this will leave "holes" in the ring, there
      is also a spider_net_resync_tail_ptr() that will skip over such holes.
      Signed-off-by: default avatarLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      4c4bd5a9
    • Linas Vepstas's avatar
      spidernet: null out skb pointer after its been used. · 83d35145
      Linas Vepstas authored
      Avoid kernel crash in mm/slab.c due to double-free of pointer.
      
      If the ethernet interface is brought down while there is still
      RX traffic in flight, the device shutdown routine can end up
      trying to double-free an skb, leading to a crash in mm/slab.c
      Avoid the double-free by nulling out the skb pointer.
      Signed-off-by: default avatarLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      83d35145
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 · d025d785
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
        firewire: Only set client->iso_context if allocation was successful.
        ieee1394: fix to ether1394_tx in ether1394.c
        firewire: fix hang after card ejection
      d025d785
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · b3f4256f
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
        IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary
        IB/mlx4: Handle FW command interface rev 3
        IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()
        IB/mlx4: Get rid of max_inline_data calculation
        IB/mlx4: Handle new FW requirement for send request prefetching
        IB/mlx4: Fix warning in rounding up queue sizes
        IB/mlx4: Fix handling of wq->tail for send completions
      b3f4256f
    • Kristian Høgsberg's avatar
      firewire: Only set client->iso_context if allocation was successful. · 24315c5e
      Kristian Høgsberg authored
      This patch fixes an OOPS on cdev release for an fd where iso context
      creation failed.
      Signed-off-by: default avatarKristian Høgsberg <krh@redhat.com>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      24315c5e
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 044f620a
      Linus Torvalds authored
      * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
        [MIPS] Don't drag a platform specific header into generic arch code.
      044f620a
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · c53ab5d5
      Linus Torvalds authored
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
        [POWERPC] Fix powermac late initcall to only run on powermac
        [POWERPC] PowerPC: Prevent data exception in kernel space (32-bit)
      c53ab5d5
    • Li Yang's avatar
      Fix up CREDIT entry ordering · 8acff0a2
      Li Yang authored
      Reorder my CREDIT entry to make it in alphabetic order by last name.
      Signed-off-by: default avatarLi Yang <leoli@freescale.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8acff0a2
    • Yinghai Lu's avatar
      x86_64: fix link warning between for .text and .init.text · bf8c4817
      Yinghai Lu authored
      WARNING: arch/x86_64/kernel/built-in.o(.text+0xace9): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
      WARNING: arch/x86_64/kernel/built-in.o(.text+0xad09): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
      WARNING: arch/x86_64/kernel/built-in.o(.text+0xad38): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
      WARNING: drivers/built-in.o(.text+0x3a680): Section mismatch: reference to .init.text:acpi_map_pxm_to_node (between 'acpi_get_node' and 'acpi_lock_ac_dir')
      
      AK: also marked mtrr_bp_init __init to avoid some more warnings
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf8c4817
    • Andi Kleen's avatar
      x86: change_page_attr bandaids · 018d2ad0
      Andi Kleen authored
      - Disable CLFLUSH again; it is still broken. Always do WBINVD.
      - Always flush in the i386 case, not only when there are deferred pages.
      
      These are both brute-force inefficient fixes, to be improved
      next release cycle.
      
      The changes to i386 are a little more extensive than strictly
      needed (some dead code added), but it is more similar to the x86-64 version
      now and the dead code will be used soon.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      018d2ad0
    • Andi Kleen's avatar
      x86: Disable KPROBES with DEBUG_RODATA for now · 55181000
      Andi Kleen authored
      Right now Kprobes cannot write to the write protected kernel text when
      DEBUG_RODATA is enabled. Disallow this in Kconfig for now.
      
      Temporary fix for 2.6.22. In .23 add code to temporarily
      unprotect it.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      55181000
    • Andi Kleen's avatar
      x86: Only make Macintosh drivers default on Macs · 9f1f79e6
      Andi Kleen authored
      It's already annoying that they appear on x86 now -- that's for the 3button
      emulation needed on x86 macs -- but at least don't make them default.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f1f79e6
    • Andi Kleen's avatar
      x86_64: Quieten Atari keyboard warnings in Kconfig · 0e52d328
      Andi Kleen authored
      Not directly related to x86, but I got tired of seeing these warnings on every
      kconfig update when building on a non m68k box:
      
      drivers/input/keyboard/Kconfig:170:warning: 'select' used by config symbol 'KEYBOARD_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
      drivers/input/mouse/Kconfig:182:warning: 'select' used by config symbol 'MOUSE_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
      
      I moved the definition of ATARI_KBD_CORE into drivers/input/keyboard/Kconfig
      so it's always seen by Kconfig.
      
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e52d328
    • Andi Kleen's avatar
      x86: Disable DAC on VIA bridges · 388c19e1
      Andi Kleen authored
      Several reports that VIA bridges don't support DAC and corrupt
      data.  I don't know if it's fixed, but let's just blacklist
      them all for now.
      
      It can be overwritten with iommu=usedac
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      388c19e1
    • Andi Kleen's avatar
      x86_64: Fix eventd/timerfd syscalls · 0b622330
      Andi Kleen authored
      They had the same syscall number.
      
      Pointed out by Davide Libenzi
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b622330
    • Andi Kleen's avatar
      x86_64: Fix readahead/sync_file_range/fadvise64 compat calls · e412ac49
      Andi Kleen authored
      Correctly convert the u64 arguments from 32bit to 64bit.
      
      Pointed out by Heiko Carstens.
      
      I guess this proves Linus' theory that nobody uses the more exotic Linux
      specific syscalls.  It wasn't discovered by a user.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e412ac49
    • Ralf Baechle's avatar
      [MIPS] Don't drag a platform specific header into generic arch code. · 3b1d4ed5
      Ralf Baechle authored
      For some platforms it's definitions may conflict.  So that's the one-liner.
      The rest is 10 square kilometers of collateral damage fixup this include
      used to paper over.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      3b1d4ed5
    • Tony Breeds's avatar
      [POWERPC] Fix powermac late initcall to only run on powermac · c5f226c7
      Tony Breeds authored
      Current ppc64_defconfig kernel fails to boot on iSeries, dying with:
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc00000000071b258
      Oops: Kernel access of bad area, sig: 11 [#1]
      SMP NR_CPUS=32 iSeries
      <snip>
      NIP [c00000000071b258] .iSeries_src_init+0x34/0x64
      LR [c000000000701bb4] .kernel_init+0x1fc/0x3bc
      Call Trace:
      [c000000007d0be30] [0000000000008000] 0x8000 (unreliable)
      [c000000007d0bea0] [c000000000701bb4] .kernel_init+0x1fc/0x3bc
      [c000000007d0bf90] [c0000000000262d4] .kernel_thread+0x4c/0x68
      Instruction dump:
      e922cba8 3880ffff 78840420 f8010010 f821ff91 60000000 e8090000 78095fe3
      4182002c e922cb58 e862cbb0 e9290140 <e8090000> f8410028 7c0903a6 e9690010
      Kernel panic - not syncing: Attempted to kill init!
      
      This happens because some powermac code unconditionally sets
      ppc_md.progress to NULL.  This patch makes sure the powermac late
      initcall is only run on powermac machines.
      Signed-off-by: default avatarTony Breeds <tony@bakeyournoodle.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      c5f226c7
    • Segher Boessenkool's avatar
      [POWERPC] PowerPC: Prevent data exception in kernel space (32-bit) · 9ba4ace3
      Segher Boessenkool authored
      The "is_exec" branch of the protection check in do_page_fault()
      didn't do anything on 32-bit PowerPC.  So if a userland program
      jumps to a page with Linux protection flags "---p", all the tests
      happily fall through, and handle_mm_fault() is called, which in
      turn calls handle_pte_fault(), which calls update_mmu_cache(),
      which goes flush the dcache to a page with no access rights.
      
      Boom.
      
      This fixes it.
      Signed-off-by: default avatarSegher Boessenkool <segher@kernel.crashing.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      9ba4ace3
    • Li Yang's avatar
      [POWERPC] rheap - eliminates internal fragments caused by alignment · 7c8545e9
      Li Yang authored
      The patch adds fragments caused by rh_alloc_align() back to free list, instead
      of allocating the whole chunk of memory.  This will greatly improve memory
      utilization managed by rheap.
      
      It solves MURAM not enough problem with 3 UCCs enabled on MPC8323.
      Signed-off-by: default avatarLi Yang <leoli@freescale.com>
      Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se> 
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      7c8545e9
  2. 19 Jun, 2007 16 commits
  3. 18 Jun, 2007 4 commits
    • Linus Torvalds's avatar
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds authored
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • Ingo Molnar's avatar
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar authored
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: default avatarGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 4cc21505
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
        [SCSI] ESP: Don't forget to clear ESP_FLAG_RESETTING.
        [SCSI] fusion: fix for BZ 8426 - massive slowdown on SCSI CD/DVD drive
      4cc21505
    • Benjamin Herrenschmidt's avatar
      Fix signalfd interaction with thread-private signals · caec4e8d
      Benjamin Herrenschmidt authored
      Don't let signalfd dequeue private signals off other threads (in the
      case of things like SIGILL or SIGSEGV, trying to do so would result
      in undefined behaviour on who actually gets the signal, since they
      are force unblocked).
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caec4e8d