1. 24 Jun, 2008 8 commits
    • Gerd Hoffmann's avatar
      x86: Add structs and functions for paravirt clocksource · 7af192c9
      Gerd Hoffmann authored
      This patch adds structs for the paravirt clocksource ABI
      used by both xen and kvm (pvclock-abi.h).
      
      It also adds some helper functions to read system time and
      wall clock time from a paravirtual clocksource (pvclock.[ch]).
      They are based on the xen code.  They are enabled using
      CONFIG_PARAVIRT_CLOCK.
      
      Subsequent patches of this series will put the code in use.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      7af192c9
    • Avi Kivity's avatar
      KVM: VMX: Fix host msr corruption with preemption enabled · a9b21b62
      Avi Kivity authored
      Switching msrs can occur either synchronously as a result of calls to
      the msr management functions (usually in response to the guest touching
      virtualized msrs), or asynchronously when preempting a kvm thread that has
      guest state loaded.  If we're unlucky enough to have the two at the same
      time, host msrs are corrupted and the machine goes kaput on the next syscall.
      
      Most easily triggered by Windows Server 2008, as it does a lot of msr
      switching during bootup.
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      a9b21b62
    • Avi Kivity's avatar
      KVM: ioapic: fix lost interrupt when changing a device's irq · 4fa6b9c5
      Avi Kivity authored
      The ioapic acknowledge path translates interrupt vectors to irqs.  It
      currently uses a first match algorithm, stopping when it finds the first
      redirection table entry containing the vector.  That fails however if the
      guest changes the irq to a different line, leaving the old redirection table
      entry in place (though masked).  Result is interrupts not making it to the
      guest.
      
      Fix by always scanning the entire redirection table.
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      4fa6b9c5
    • Avi Kivity's avatar
      KVM: MMU: Fix oops on guest userspace access to guest pagetable · 6bf6a953
      Avi Kivity authored
      KVM has a heuristic to unshadow guest pagetables when userspace accesses
      them, on the assumption that most guests do not allow userspace to access
      pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
      it also oopses.
      
      This never triggers on ordinary guests since sane OSes will clear the
      pagetables before assigning them to userspace, which will trigger the flood
      heuristic, unshadowing the pagetables before the first userspace access. One
      particular guest, though (Xenner) will run the kernel in userspace, triggering
      the oops.  Since the heuristic is incorrect in this case, we can simply
      remove it.
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      6bf6a953
    • Marcelo Tosatti's avatar
      KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) · 30945387
      Marcelo Tosatti authored
      kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
      guests properly. It will instantiate two 2MB sptes pointing to the same
      physical 2MB page when a guest large pte update is trapped.
      
      Instead of duplicating code to handle this, disallow directory level
      updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
      emulating one guest 4MB pte can be correctly created by the page fault
      handling path.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      30945387
    • Marcelo Tosatti's avatar
      KVM: MMU: Fix rmap_write_protect() hugepage iteration bug · 6597ca09
      Marcelo Tosatti authored
      rmap_next() does not work correctly after rmap_remove(), as it expects
      the rmap chains not to change during iteration.  Fix (for now) by restarting
      iteration from the beginning.
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      6597ca09
    • Marcelo Tosatti's avatar
      KVM: close timer injection race window in __vcpu_run · 06e05645
      Marcelo Tosatti authored
      If a timer fires after kvm_inject_pending_timer_irqs() but before
      local_irq_disable() the code will enter guest mode and only inject such
      timer interrupt the next time an unrelated event causes an exit.
      
      It would be simpler if the timer->pending irq conversion could be done
      with IRQ's disabled, so that the above problem cannot happen.
      
      For now introduce a new vcpu requests bit to cancel guest entry.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      06e05645
    • Marcelo Tosatti's avatar
      KVM: Fix race between timer migration and vcpu migration · d4acf7e7
      Marcelo Tosatti authored
      A guest vcpu instance can be scheduled to a different physical CPU
      between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().
      
      If that happens, the timer will only be migrated to the current pCPU on
      the next exit, meaning that guest LAPIC timer event can be delayed until
      a host interrupt is triggered.
      
      Fix it by cancelling guest entry if any vcpu request is pending.  This
      has the side effect of nicely consolidating vcpu->requests checks.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      d4acf7e7
  2. 23 Jun, 2008 18 commits
  3. 22 Jun, 2008 1 commit
  4. 21 Jun, 2008 9 commits
    • Christoph Lameter's avatar
      Slab: Fix memory leak in fallback_alloc() · 481c5346
      Christoph Lameter authored
      The zonelist patches caused the loop that checks for available
      objects in permitted zones to not terminate immediately. One object
      per zone per allocation may be allocated and then abandoned.
      
      Break the loop when we have successfully allocated one object.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      481c5346
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 62a8efe6
      Linus Torvalds authored
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        Ext4: Fix online resize block group descriptor corruption
      62a8efe6
    • Linus Torvalds's avatar
      Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6 · bec95aab
      Linus Torvalds authored
      * 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6:
        hwmon: (lm75) sensor reading bugfix
        hwmon: (abituguru3) update driver detection
        hwmon: (w83791d) new maintainer
        hwmon: (abituguru3) Identify Abit AW8D board as such
        hwmon: Update the sysfs interface documentation
        hwmon: (adt7473) Initialize max_duty_at_overheat before use
        hwmon: (lm85) Fix function RANGE_TO_REG()
      bec95aab
    • Bernhard Walle's avatar
      Add return value to reserve_bootmem_node() · 71c2742f
      Bernhard Walle authored
      This patch changes the function reserve_bootmem_node() from void to int,
      returning -ENOMEM if the allocation fails.
      
      This fixes a build problem on x86 with CONFIG_KEXEC=y and
      CONFIG_NEED_MULTIPLE_NODES=y
      Signed-off-by: default avatarBernhard Walle <bwalle@suse.de>
      Reported-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71c2742f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · a1921443
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        netns: Don't receive new packets in a dead network namespace.
        sctp: Make sure N * sizeof(union sctp_addr) does not overflow.
        pppoe: warning fix
        ipv6: Drop packets for loopback address from outside of the box.
        ipv6: Remove options header when setsockopt's optlen is 0
        mac80211: detect driver tx bugs
      a1921443
    • Eric W. Biederman's avatar
      netns: Don't receive new packets in a dead network namespace. · b9f75f45
      Eric W. Biederman authored
      Alexey Dobriyan <adobriyan@gmail.com> writes:
      > Subject: ICMP sockets destruction vs ICMP packets oops
      
      > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
      > icmp_reply() wants ICMP socket.
      >
      > Steps to reproduce:
      >
      > 	launch shell in new netns
      > 	move real NIC to netns
      > 	setup routing
      > 	ping -i 0
      > 	exit from shell
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > PGD 17f3cd067 PUD 17f3ce067 PMD 0 
      > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
      > CPU 0 
      > Modules linked in: usblp usbcore
      > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
      > RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
      > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
      > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
      > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
      > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
      > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
      > FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
      > Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
      >  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
      >  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
      > Call Trace:
      >  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
      >  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
      >  [<ffffffff803fd645>] icmp_echo+0x65/0x70
      >  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
      >  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
      >  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
      >  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
      >  [<ffffffff803be57d>] process_backlog+0x9d/0x100
      >  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
      >  [<ffffffff80237be5>] __do_softirq+0x75/0x100
      >  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
      >  [<ffffffff8020f085>] do_softirq+0x65/0xa0
      >  [<ffffffff80237af7>] irq_exit+0x97/0xa0
      >  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
      >  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
      >  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
      >  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
      >  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
      >  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
      >  [<ffffffff8040f380>] ? rest_init+0x70/0x80
      > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
      > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
      > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
      > RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      >  RSP <ffffffff8057fc30>
      > CR2: 0000000000000000
      > ---[ end trace ea161157b76b33e8 ]---
      > Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      Receiving packets while we are cleaning up a network namespace is a
      racy proposition. It is possible when the packet arrives that we have
      removed some but not all of the state we need to fully process it.  We
      have the choice of either playing wack-a-mole with the cleanup routines
      or simply dropping packets when we don't have a network namespace to
      handle them.
      
      Since the check looks inexpensive in netif_receive_skb let's just
      drop the incoming packets.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9f75f45
    • David S. Miller's avatar
      sctp: Make sure N * sizeof(union sctp_addr) does not overflow. · 735ce972
      David S. Miller authored
      As noticed by Gabriel Campana, the kmalloc() length arg
      passed in by sctp_getsockopt_local_addrs_old() can overflow
      if ->addr_num is large enough.
      
      Therefore, enforce an appropriate limit.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      735ce972
    • Stephen Hemminger's avatar
      pppoe: warning fix · 2645a3c3
      Stephen Hemminger authored
      Fix warning:
      drivers/net/pppoe.c: In function 'pppoe_recvmsg':
      drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast
      because skb->len is unsigned int and total_len is size_t
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2645a3c3
    • Linus Torvalds's avatar
      b732d968
  5. 20 Jun, 2008 4 commits
    • Ivan Kokshaysky's avatar
      alpha: resurrect Cypress IDE quirk · a744e016
      Ivan Kokshaysky authored
      Which was removed in the hope that generic legacy IDE quirk in
      drivers/pci/probe.c is sufficient for Cypress IDE.
      It isn't, as this controller has non-standard BAR layout:
      secondary channel registers are in the BAR0-1 of the second
      PCI function - not in the BAR2-3 of the same function, as the
      generic quirk routine assumes.
      Signed-off-by: default avatarIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a744e016
    • Ivan Kokshaysky's avatar
      alpha: fix compile failures with gcc-4.3 (bug #10438) · d559d4a2
      Ivan Kokshaysky authored
      Vast majority of these build failures are gcc-4.3 warnings
      about static functions and objects being referenced from
      non-static (read: "extern inline") functions, in conjunction
      with our -Werror.
      
      We cannot just convert "extern inline" to "static inline",
      as people keep suggesting all the time, because "extern inline"
      logic is crucial for generic kernel build.
      So
      - just make sure that all callees of critical "extern inline"
        functions are also "extern inline";
      - use "static inline", wherever it's possible.
      
      traps.c: work around gcc-4.3 being too smart about array
      bounds-checking.
      
      TODO: add "gnu_inline" attribute to all our "extern inline"
      functions to ensure desired behaviour with future compilers.
      Signed-off-by: default avatarIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d559d4a2
    • Ivan Kokshaysky's avatar
      alpha: link failure fix · ede42692
      Ivan Kokshaysky authored
      With built-in scsi disk driver, the final link fails with a following
      error:
      `.exit.text' referenced in section `.rodata' of drivers/built-in.o:
      defined in discarded section `.exit.text' of drivers/built-in.o
      
      This happens with -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) with all gcc-4
      versions, and also with -O2 and gcc-4.3.
      
      The problem is in sd.c:sd_major() being inlined into __exit function
      exit_sd(), and the compiler generating a jump table in .rodata section
      for the 'switch' statement in sd_major(). So we have references to
      discarded section.
      
      Fixed with a big hammer in the form of -fno-jump-tables.
      
      Note that jump tables vs. discarded sections is a generic problem,
      other architectures are just lucky not to suffer from it. But with
      a slightly more complex switch/case statement it can be reproduced
      on x86 as well. So maybe at some point we should consider
      -fno-jump-tables as a generic compile option...
      Signed-off-by: default avatarIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ede42692
    • Ivan Kokshaysky's avatar
      alpha: fix module load failures on smp (bug #10926) · 9267b4b3
      Ivan Kokshaysky authored
      To calculate addresses of locally defined variables, GCC uses 32-bit
      displacement from the GP. Which doesn't work for per cpu variables in
      modules, as an offset to the kernel per cpu area is way above 4G.
      
      The workaround is to force allocation of a GOT entry for per cpu variable
      using ldq instruction with a 'literal' relocation.
      I had to use custom asm/percpu.h, as a required argument magic doesn't
      work with asm-generic/percpu.h macros.
      Signed-off-by: default avatarIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9267b4b3