1. 18 Jun, 2009 3 commits
    • Alexander van Heukelum's avatar
      x86: de-assembler-ize asm/desc.h · bc3f5d3d
      Alexander van Heukelum authored
      asm/desc.h is included in three assembly files, but the only macro
      it defines, GET_DESC_BASE, is never used. This patch removes the
      includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
      from asm/desc.h.
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      bc3f5d3d
    • Alexander van Heukelum's avatar
      i386: fix/simplify espfix stack switching, move it into assembly · dc4c2a0a
      Alexander van Heukelum authored
      The espfix code triggers if we have a protected mode userspace
      application with a 16-bit stack. On returning to userspace, with iret,
      the CPU doesn't restore the high word of the stack pointer. This is an
      "official" bug, and the work-around used in the kernel is to temporarily
      switch to a 32-bit stack segment/pointer pair where the high word of the
      pointer is equal to the high word of the userspace stackpointer.
      
      The current implementation uses THREAD_SIZE to determine the cut-off,
      but there is no good reason not to use the more natural 64kb... However,
      implementing this by simply substituting THREAD_SIZE with 65536 in
      patch_espfix_desc crashed the test application. patch_espfix_desc tries
      to do what is described above, but gets it subtly wrong if the userspace
      stack pointer is just below a multiple of THREAD_SIZE: an overflow
      occurs to bit 13... With a bit of luck, when the kernelspace
      stackpointer is just below a 64kb-boundary, the overflow then ripples
      trough to bit 16 and userspace will see its stack pointer changed by
      65536.
      
      This patch moves all espfix code into entry_32.S. Selecting a 16-bit
      cut-off simplifies the code. The game with changing the limit dynamically
      is removed too. It complicates matters and I see no value in it. Changing
      only the top 16-bit word of ESP is one instruction and it also implies
      that only two bytes of the ESPFIX GDT entry need to be changed and this
      can be implemented in just a handful simple to understand instructions.
      As a side effect, the operation to compute the original ESP from the
      ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
      three instructions have been expanded inline in entry_32.S.
      
      impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
      stack segment
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: default avatarStas Sergeev <stsp@aknet.ru>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      dc4c2a0a
    • Alexander van Heukelum's avatar
      i386: fix return to 16-bit stack from NMI handler · 2e04bc76
      Alexander van Heukelum authored
      Returning to a task with a 16-bit stack requires special care: the iret
      instruction does not restore the high word of esp in that case. The
      espfix code fixes this, but currently is not invoked on NMIs. This means
      that a running task gets the upper word of esp clobbered due intervening
      NMIs. To reproduce, compile and run the following program with the nmi
      watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
      see that the high bits of esp contain garbage, while the low bits are
      still correct.
      
      This patch puts the espfix code back into the NMI code path.
      
      The patch is slightly complicated due to the irqtrace infrastructure not
      being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
      Otherwise, the tail of the normal iret-code is correct for the nmi code
      path too. To be able to share this code-path, the TRACE_IRQS_IRET was
      move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
      this code explicitly disables interrupts. This short interrupts-off
      section is now not traced anymore. The return-to-kernel path now always
      includes the preliminary test to decide if the espfix code should be
      called. This is never the case, but doing it this way keeps the patch as
      simple as possible and the few extra instructions should not affect
      timing in any significant way.
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <sys/types.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <sys/syscall.h>
       #include <asm/ldt.h>
      
      int modify_ldt(int func, void *ptr, unsigned long bytecount)
      {
              return syscall(SYS_modify_ldt, func, ptr, bytecount);
      }
      
      /* this is assumed to be usable */
       #define SEGBASEADDR 0x10000
       #define SEGLIMIT 0x20000
      
      /* 16-bit segment */
      struct user_desc desc = {
              .entry_number = 0,
              .base_addr = SEGBASEADDR,
              .limit = SEGLIMIT,
              .seg_32bit = 0,
              .contents = 0, /* ??? */
              .read_exec_only = 0,
              .limit_in_pages = 0,
              .seg_not_present = 0,
              .useable = 1
      };
      
      int main(void)
      {
              setvbuf(stdout, NULL, _IONBF, 0);
      
              /* map a 64 kb segment */
              char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                              PROT_EXEC|PROT_READ|PROT_WRITE,
                              MAP_SHARED|MAP_ANONYMOUS, -1, 0);
              if (pointer == NULL) {
                      printf("could not map space\n");
                      return 0;
              }
      
              /* write ldt, new mode */
              int err = modify_ldt(0x11, &desc, sizeof(desc));
              if (err) {
                      printf("error modifying ldt: %i\n", err);
                      return 0;
              }
      
              for (int i=0; i<1000; i++) {
              asm volatile (
                      "pusha\n\t"
                      "mov %ss, %eax\n\t" /* preserve ss:esp */
                      "mov %esp, %ebp\n\t"
                      "push $7\n\t" /* index 0, ldt, user mode */
                      "push $65536-4096\n\t" /* esp */
                      "lss (%esp), %esp\n\t" /* switch to new stack */
                      "push %eax\n\t" /* save old ss:esp on new stack */
                      "push %ebp\n\t"
                      "add $17*65536, %esp\n\t" /* set high bits */
                      "mov %esp, %edx\n\t"
      
                      "mov $10000000, %ecx\n\t" /* wait... */
                      "1: loop 1b\n\t" /* ... a bit */
      
                      "cmp %esp, %edx\n\t"
                      "je 1f\n\t"
                      "ud2\n\t" /* esp changed inexplicably! */
                      "1:\n\t"
                      "sub $17*65536, %esp\n\t" /* restore high bits */
                      "lss (%esp), %esp\n\t" /* restore old ss:esp */
                      "popa\n\t");
      
                      printf("\rx%ix", i);
              }
      
              return 0;
      }
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: default avatarStas Sergeev <stsp@aknet.ru>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      2e04bc76
  2. 17 Jun, 2009 12 commits
    • Cyrill Gorcunov's avatar
      x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present · 3f4c3955
      Cyrill Gorcunov authored
      Vegard Nossum reported:
      
      [  503.576724] ACPI: Preparing to enter system sleep state S5
      [  503.710857] Disabling non-boot CPUs ...
      [  503.716853] Power down.
      [  503.717770] ------------[ cut here ]------------
      [  503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du)
      [  503.717770] Hardware name: OptiPlex GX100
      [  503.717770] Modules linked in:
      [  503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443
      [  503.717770] Call Trace:
      [  503.717770]  [<c154d327>] ? printk+0x18/0x1a
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c10360fc>] warn_slowpath_common+0x6c/0xc0
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1036165>] warn_slowpath_null+0x15/0x20
      [  503.717770]  [<c1017358>] native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1017173>] disconnect_bsp_APIC+0x63/0x100
      [  503.717770]  [<c1019e48>] disable_IO_APIC+0xb8/0xc0
      [  503.717770]  [<c1214231>] ? acpi_power_off+0x0/0x29
      [  503.717770]  [<c1015e55>] native_machine_shutdown+0x65/0x80
      [  503.717770]  [<c1015c36>] native_machine_power_off+0x26/0x30
      [  503.717770]  [<c1015c49>] machine_power_off+0x9/0x10
      [  503.717770]  [<c1046596>] kernel_power_off+0x36/0x40
      [  503.717770]  [<c104680d>] sys_reboot+0xfd/0x1f0
      [  503.717770]  [<c109daa0>] ? perf_swcounter_event+0xb0/0x130
      [  503.717770]  [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120
      [  503.717770]  [<c102dfc6>] ? finish_task_switch+0x56/0xd0
      [  503.717770]  [<c154da1e>] ? schedule+0x49e/0xb40
      [  503.717770]  [<c10444b0>] ? sys_kill+0x70/0x160
      [  503.717770]  [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50
      [  503.717770]  [<c10dd443>] ? sys_ioctl+0x63/0x70
      [  503.717770]  [<c1003024>] sysenter_do_call+0x12/0x22
      [  503.717770] ---[ end trace 8157b5d0ed378f15 ]---
      
      |
      | That's including this commit:
      |
      | commit 103428e5
      |Author: Cyrill Gorcunov <gorcunov@openvz.org>
      |Date:   Sun Jun 7 16:48:40 2009 +0400
      |
      |    x86, apic: Fix dummy apic read operation together with broken MP handling
      |
      
      If we have apic disabled we don't even switch to APIC mode and do not
      calling for connect_bsp_APIC. Though on SMP compiled kernel the
      native_machine_shutdown does try to write the apic register anyway.
      
      Fix it with explicit check if we really should touch apic registers.
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090617181322.GG10822@lenovo>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3f4c3955
    • Huang Weiyi's avatar
      x86: Remove duplicated #include's · 8653f88f
      Huang Weiyi authored
      Signed-off-by: default avatarHuang Weiyi <weiyi.huang@gmail.com>
      LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8653f88f
    • Jaswinder Singh Rajput's avatar
      x86: msr.h linux/types.h is only required for __KERNEL__ · 8fa62ad9
      Jaswinder Singh Rajput authored
      <linux/types.h> is only required for __KERNEL__ as whole file is covered with it
      
      Also fixed some spacing issues for usr/include/asm-x86/msr.h
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: "H. Peter Anvin" <hpa@kernel.org>
      LKML-Reference: <1245228070.2662.1.camel@ht.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8fa62ad9
    • Prarit Bhargava's avatar
      x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround · fe955e5c
      Prarit Bhargava authored
      Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
      (cpuid family-6, model-f, stepping-4).  Resolves a situation where the NMI
      would not enable on these processors.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Acked-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: prarit@redhat.com
      Cc: suresh.b.siddha@intel.com
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fe955e5c
    • Jaswinder Singh Rajput's avatar
    • Figo.zhang's avatar
      x86, io_apic.c: Work around compiler warning · 50a8d4d2
      Figo.zhang authored
      This compiler warning:
      
        arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
        arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
        arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here
      
      Is bogus as 'eu' is always initialized. But annotate it away by
      initializing the variable, to make it easier for people to notice
      real warnings. A compiler that sees through this logic will
      optimize away the initialization.
      Signed-off-by: default avatarFigo.zhang <figo1802@gmail.com>
      LKML-Reference: <1245248720.3312.27.camel@myhost>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      50a8d4d2
    • Cyrill Gorcunov's avatar
      x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present · 5ce4243d
      Cyrill Gorcunov authored
      If APIC was disabled (for some reason) and as result
      it's not even mapped we should not try to enable thermal
      interrupts at all.
      Reported-by: default avatarSimon Holm Thøgersen <odie@cs.aau.dk>
      Tested-by: default avatarSimon Holm Thøgersen <odie@cs.aau.dk>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <20090615182633.GA7606@lenovo>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5ce4243d
    • Andi Kleen's avatar
      x86: mce: Handle banks == 0 case in K7 quirk · 203abd67
      Andi Kleen authored
      Vegard Nossum reported:
      
      > I get an MCE-related crash like this in latest linus tree:
      >
      > [    0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
      > [    0.116396] CPU: L2 Cache: 512K (64 bytes/line)
      > [    0.120570] mce: CPU supports 0 MCE banks
      > [    0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
      > [    0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] PGD 0
      > [    0.128001] Thread overran stack, or stack corrupted
      > [    0.128001] Oops: 0002 [#1] PREEMPT SMP
      > [    0.128001] last sysfs file:
      > [    0.128001] CPU 0
      > [    0.128001] Modules linked in:
      > [    0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
      > [    0.128001] RIP: 0010:[<ffffffff813b98ad>]  [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] RSP: 0018:ffffffff81595e38  EFLAGS: 00000246
      > [    0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
      > [    0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
      > [    0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
      > [    0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
      > [    0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
      > [    0.128001] FS:  0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
      > 00000000000
      > [    0.128001] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [    0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
      > [    0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [    0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
      > [    0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
      > ff8152a4a0)
      > [    0.128001] Stack:
      > [    0.128001]  0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
      > 914
      > [    0.128001]  ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
      > 69c
      > [    0.128001]  5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
      > e6e
      > [    0.128001] Call Trace:
      > [    0.128001]  [<ffffffff813b869c>] identify_cpu+0x331/0x392
      > [    0.128001]  [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
      > [    0.128001]  [<ffffffff815a14ac>] check_bugs+0x1c/0x60
      > [    0.128001]  [<ffffffff8159c075>] start_kernel+0x403/0x46e
      > [    0.128001]  [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
      > [    0.128001]  [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
      > [    0.128001]  [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
      
      This happens on QEMU which reports MCA capability, but no banks.
      Without this patch there is a buffer overrun and boot ops because
      the code would try to initialize the 0 element of a zero length
      kmalloc() buffer.
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      203abd67
    • Ingo Molnar's avatar
      Merge branch 'linus' into x86/urgent · cc4949e1
      Ingo Molnar authored
      Merge reason: pull in latest to fix a bug in it.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cc4949e1
    • H. Peter Anvin's avatar
      x86, boot: use .code16gcc instead of .code16 · 28b48688
      H. Peter Anvin authored
      Use .code16gcc to compile arch/x86/boot/bioscall.S rather than
      .code16, since some older versions of binutils can't generate 32-bit
      addressing expressions (67 prefixes) in .code16 mode, only in
      .code16gcc mode.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      28b48688
    • Cliff Wickman's avatar
      x86: correct the conversion of EFI memory types · e2a71476
      Cliff Wickman authored
      This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
      in the e820 table as type E820_RESERVED.
      
      (This patch replaces one called 'x86: vendor reserved memory type'.
       This version has been discussed a bit with Peter and Yinghai but not given
       a final opinion.)
      
      Without this patch EFI_RESERVED_TYPE memory reservations may be
      marked usable in the e820 table. There may be a collision between
      kernel use and some reserver's use of this memory.
      
      (An example use of this functionality is the UV system, which
       will access extremely large areas of memory with a memory engine
       that allows a user to address beyond the processor's range.  Such
       areas are reserved in the EFI table by the BIOS.
       Some loaders have a restricted number of entries possible in the e820 table,
       hence the need to record the reservations in the unrestricted EFI table.)
      
      The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
      on the kernel command line.
      Signed-off-by: default avatarCliff Wickman <cpw@sgi.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      e2a71476
    • H. Peter Anvin's avatar
      x86: cap iomem_resource to addressable physical memory · 95ee14e4
      H. Peter Anvin authored
      iomem_resource is by default initialized to -1, which means 64 bits of
      physical address space if 64-bit resources are enabled.  However, x86
      CPUs cannot address 64 bits of physical address space.  Thus, we want
      to cap the physical address space to what the union of all CPU can
      actually address.
      
      Without this patch, we may end up assigning inaccessible values to
      uninitialized 64-bit PCI memory resources.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Martin Mares <mj@ucw.cz>
      Cc: stable@kernel.org
      95ee14e4
  3. 16 Jun, 2009 25 commits