1. 01 Apr, 2010 5 commits
    • Pallipadi, Venkatesh's avatar
      x86, hpet: Erratum workaround for read after write of HPET comparator · 8da854cb
      Pallipadi, Venkatesh authored
      On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote:
      > Hello,
      >
      > Again, on the Intel DP55KG board:
      >
      > # uname -a
      > Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux
      >
      > [    1.237600] ------------[ cut here ]------------
      > [    1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80()
      > [    1.238221] Hardware name:
      > [    1.238504] hpet: compare register read back failed.
      > [    1.238793] Modules linked in:
      > [    1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1
      > [    1.239605] Call Trace:
      > [    1.239886]  <IRQ>  [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0
      > [    1.240409]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.240699]  [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50
      > [    1.240992]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241281]  [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80
      > [    1.241573]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241859]  [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100
      > [    1.246533]  [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30
      > [    1.246826]  [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0
      > [    1.247118]  [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160
      > [    1.247407]  [<ffffffff81029f55>] ? handle_irq+0x15/0x20
      > [    1.247689]  [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0
      > [    1.247976]  [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa
      > [    1.248262]  <EOI>  [<ffffffff8102f277>] ? mwait_idle+0x57/0x80
      > [    1.248796]  [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0
      > [    1.249080] ---[ end trace db7f668fb6fef4e1 ]---
      >
      > Is this something Intel has to fix or is it a bug in the kernel?
      
      This is a chipset erratum.
      
      Thomas: You mentioned we can retain this check only for known-buggy and
      hpet debug kind of options. But here is the simple workaround patch for
      this particular erratum.
      
      Some chipsets have a erratum due to which read immediately following a
      write of HPET comparator returns old comparator value instead of most
      recently written value.
      
      Erratum 15 in
      "Intel I/O Controller Hub 9 (ICH9) Family Specification Update"
      (http://www.intel.com/assets/pdf/specupdate/316973.pdf)
      
      Workaround for the errata is to read the comparator twice if the first
      one fails.
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
      Cc: <stable@kernel.org>
      8da854cb
    • Yinghai Lu's avatar
      bootmem, x86: Fix 32bit numa system without RAM on node 0 · aa235fc7
      Yinghai Lu authored
      When 32bit numa is used, free_all_bootmem() will still only go over with
      node id 0.
      
      If node 0 doesn't have RAM installed, the lowest populated node
      becomes low RAM.
      
      This one fixes BOOTMEM path by iterating over the bdata_list.
      
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate from one big patch
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB416D7.6090203@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      aa235fc7
    • Yinghai Lu's avatar
      nobootmem, x86: Fix 32bit numa system without RAM on node 0 · 33799858
      Yinghai Lu authored
      On one system without RAM on node0, got following boot dump with a 32
      bit NUMA kernel:
      
      early_node_map[4] active PFN ranges
          1: 0x00000010 -> 0x00000099
          1: 0x00000100 -> 0x0007da00
          1: 0x0007e800 -> 0x0007ffa0
          1: 0x0007ffae -> 0x0007ffb0
      ...
      Subtract (29 early reservations)
        #000 [0000001000 - 0000002000]
        #001 [0000089000 - 000008f000]
        #002 [0000091000 - 0000093500]
      ...
        #027 [007cbfef40 - 007e800000]
        #028 [007e9ca000 - 007ff95000]
      (0 free memory ranges)
      Initializing HighMem for node 0 (00000000:00000000)
      Initializing HighMem for node 1 (00000000:00000000)
      Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
      ...
      Checking if this processor honours the WP bit even in supervisor mode...Ok.
      swapper: page allocation failure. order:0, mode:0x0
      Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
      Call Trace:
       [<4087a5dc>] ? printk+0xf/0x11
       [<40286728>] __alloc_pages_nodemask+0x417/0x487
       [<402a9ce1>] new_slab+0xe2/0x1fe
       [<402aa5b2>] kmem_cache_open+0x185/0x358
       [<402abbc0>] T.954+0x1c/0x60
       [<40d52a29>] kmem_cache_init+0x24/0x113
       [<40d39738>] start_kernel+0x166/0x2e4
       [<40d3940e>] ? unknown_bootoption+0x0/0x18e
       [<40d390ce>] i386_start_kernel+0xce/0xd5
      Mem-Info:
      Node 1 DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      Node 1 Normal per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      active_anon:0 inactive_anon:0 isolated_anon:0
       active_file:0 inactive_file:0 isolated_file:0
       unevictable:0 dirty:0 writeback:0 unstable:0
       free:0 slab_reclaimable:0 slab_unreclaimable:0
       mapped:0 shmem:0 pagetables:0 bounce:0
      
      When 32bit NUMA is used, free_all_bootmem() will still only go over with
      node id 0.
      
      If node 0 doesn't have RAM installed, We need to go with node1
      because early_node_map still use 1 for all ranges, and ram from node1
      become low ram.
      
      Use MAX_NUMNODES like 64-bit NUMA does.
      
      Note: BOOTMEM path has the same problem.
            this bug exist before We have NO_BOOTMEM support.
      
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate bootmem path fix
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB41689.9090502@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      33799858
    • Andi Kleen's avatar
      x86: Handle overlapping mptables · 909fc87b
      Andi Kleen authored
      We found a system where the MP table MPC and MPF structures overlap.
      
      That doesn't really matter because the mptable is not used anyways with ACPI,
      but it leads to a panic in the early allocator due to the overlapping
      reservations in 2.6.33.
      
      Earlier kernels handled this without problems.
      
      Simply change these reservations to reserve_early_overlap_ok to avoid
      the panic.
      Reported-by: default avatarThomas Renninger <trenn@suse.de>
      Tested-by: default avatarThomas Renninger <trenn@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20100329074111.GA22821@basil.fritz.box>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      909fc87b
    • Yinghai Lu's avatar
      x86: Make e820_remove_range to handle all covered case · 9f3a5f52
      Yinghai Lu authored
      Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and
      looks e820_remove_range does not work right.
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  LGUEST: 0000000000000000 - 0000000004000000 (usable)
      [    0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS!
      [    0.000000] DMI not present or invalid.
      [    0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000
      [    0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000
      
      root cause is: the e820_remove_range doesn't handle the all covered
      case.  e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...)
      produces a bogus range as a result.
      
      Make it match e820_update_range() by handling that case too.
      Reported-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Tested-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4BB18E55.6090903@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      9f3a5f52
  2. 30 Mar, 2010 1 commit
    • Shaohua Li's avatar
      x86-32, resume: do a global tlb flush in S4 resume · 8ae06d22
      Shaohua Li authored
      Colin King reported a strange oops in S4 resume code path (see below). The test
      system has i5/i7 CPU. The kernel doesn't open PAE, so 4M page table is used.
      The oops always happen a virtual address 0xc03ff000, which is mapped to the
      last 4k of first 4M memory. Doing a global tlb flush fixes the issue.
      
      EIP: 0060:[<c0493a01>] EFLAGS: 00010086 CPU: 0
      EIP is at copy_loop+0xe/0x15
      EAX: 36aeb000 EBX: 00000000 ECX: 00000400 EDX: f55ad46c
      ESI: 0f800000 EDI: c03ff000 EBP: f67fbec4 ESP: f67fbea8
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      ...
      ...
      CR2: 00000000c03ff000
      Tested-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      LKML-Reference: <20100305005932.GA22675@sli10-desk.sh.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      8ae06d22
  3. 29 Mar, 2010 3 commits
    • Ian Campbell's avatar
      x86: Do not free zero sized per cpu areas · eed63519
      Ian Campbell authored
      This avoids an infinite loop in free_early_partial().
      
      Add a warning to free_early_partial() to catch future problems.
      
      -v5: put back start > end back into WARN_ONCE()
      -v6: use one line for warning, suggested by Linus
      -v7: more tests
      -v8: remove the function name as suggested by Johannes
           WARN_ONCE() will print out that function name.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: default avatarJoel Becker <joel.becker@oracle.com>
      Tested-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-4-git-send-email-yinghai@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      eed63519
    • Yinghai Lu's avatar
      x86: Make sure free_init_pages() frees pages on page boundary · c967da6a
      Yinghai Lu authored
      When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
      in a more compact fashion.
      
      Example:
      
       Allocated new RAMDISK: 00ec2000 - 0248ce57
       Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56
      
      The new RAMDISK's end is not page aligned.
      Last page could be shared with other users.
      
      When free_init_pages are called for initrd or .init, the page
      could be freed and we could corrupt other data.
      
      code segment in free_init_pages():
      
       |        for (; addr < end; addr += PAGE_SIZE) {
       |                ClearPageReserved(virt_to_page(addr));
       |                init_page_count(virt_to_page(addr));
       |                memset((void *)(addr & ~(PAGE_SIZE-1)),
       |                        POISON_FREE_INITMEM, PAGE_SIZE);
       |                free_page(addr);
       |                totalram_pages++;
       |        }
      
      last half page could be used as one whole free page.
      
      So page align the boundaries.
      
      -v2: make the original initramdisk to be aligned, according to
           Johannes, otherwise we have the chance to lose one page.
           we still need to keep initrd_end not aligned, otherwise it could
           confuse decompressor.
      -v3: change to WARN_ON instead, suggested by Johannes.
      -v4: use PAGE_ALIGN, suggested by Johannes.
           We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
           Add comments about assuming ramdisk start is aligned
           in relocate_initrd(), change to re get ramdisk_image instead of save it
           to make diff smaller. Add warning for wrong range, suggested by Johannes.
      -v6: remove one WARN()
           We need to align beginning in free_init_pages()
           do not copy more than ramdisk_size, noticed by Johannes
      Reported-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Tested-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c967da6a
    • Yinghai Lu's avatar
      x86: Make smp_locks end with page alignment · 596b711e
      Yinghai Lu authored
      Fix:
      
       ------------[ cut here ]------------
       WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa()
       free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned
       Modules linked in:
       Pid: 0, comm: swapper Not tainted
       2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace:
        [<40232e9f>] warn_slowpath_common+0x65/0x7c
        [<4021c9f0>] ? free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40232eea>] warn_slowpath_fmt+0x24/0x27
        [<4021c9f0>] free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40d3f4bd>] alternative_instructions+0xf6/0x100
        [<40d3fe4f>] check_bugs+0xbd/0xbf
        [<40d398a7>] start_kernel+0x2d5/0x2e4
        [<40d390ce>] i386_start_kernel+0xce/0xd5
       ---[ end trace 4eaa2a86a8e2da22 ]---
      
      Comments in vmlinux.lds.S already said:
      
       |        /*
       |         * smp_locks might be freed after init
       |         * start/end must be page aligned
       |         */
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      596b711e
  4. 26 Mar, 2010 13 commits
  5. 25 Mar, 2010 18 commits