1. 15 Apr, 2011 1 commit
    • KOSAKI Motohiro's avatar
      x86, NUMA: Fix fakenuma boot failure · 7d6b4670
      KOSAKI Motohiro authored
      Currently, numa=fake boot parameter is broken. If it's used,
      kernel may panic due to devide by zero error depending on CPU
      configuration
      
      Call Trace:
       [<ffffffff8104ad4c>] find_busiest_group+0x38c/0xd30
       [<ffffffff81086aff>] ? local_clock+0x6f/0x80
       [<ffffffff81050533>] load_balance+0xa3/0x600
       [<ffffffff81050f53>] idle_balance+0xf3/0x180
       [<ffffffff81550092>] schedule+0x722/0x7d0
       [<ffffffff81550538>] ? wait_for_common+0x128/0x190
       [<ffffffff81550a65>] schedule_timeout+0x265/0x320
       [<ffffffff81095815>] ? lock_release_holdtime+0x35/0x1a0
       [<ffffffff81550538>] ? wait_for_common+0x128/0x190
       [<ffffffff8109bb6c>] ? __lock_release+0x9c/0x1d0
       [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
       [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
       [<ffffffff81550540>] wait_for_common+0x130/0x190
       [<ffffffff81051920>] ? try_to_wake_up+0x510/0x510
       [<ffffffff8155067d>] wait_for_completion+0x1d/0x20
       [<ffffffff8107f36c>] kthread_create_on_node+0xac/0x150
       [<ffffffff81077bb0>] ? process_scheduled_works+0x40/0x40
       [<ffffffff8155045f>] ? wait_for_common+0x4f/0x190
       [<ffffffff8107a283>] __alloc_workqueue_key+0x1a3/0x590
       [<ffffffff81e0cce2>] cpuset_init_smp+0x6b/0x7b
       [<ffffffff81df3d07>] kernel_init+0xc3/0x182
       [<ffffffff8155d5e4>] kernel_thread_helper+0x4/0x10
       [<ffffffff81553cd4>] ? retint_restore_args+0x13/0x13
       [<ffffffff81df3c44>] ? start_kernel+0x400/0x400
       [<ffffffff8155d5e0>] ? gs_change+0x13/0x13
      
      The divede by zero is caused by the following line,
      group->cpu_power==0:
      
       kernel/sched_fair.c::update_sg_lb_stats()
              /* Adjust by relative CPU power of the group */
              sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
      
      This regression was caused by commit e23bba60 ("x86-64, NUMA: Unify
      emulated distance mapping") because it changes cpu -> node
      mapping in the process of dropping fake_physnodes().
      
        old) all cpus are assinged node 0
        now) cpus are assigned round robin
             (the logic is implemented by numa_init_array())
      
        Note: The change in behavior only happens if the system doesn't
              have neither ACPI SRAT table nor AMD northbridge NUMA
      	information.
      
      Round robin assignment doesn't work because init_numa_sched_groups_power()
      assumes all logical cpus in the same physical cpu share the same node
      (then it only accounts for group_first_cpu()), and the simple round robin
      breaks the above assumption.
      
      Thus, this patch implements a reassignment of node-ids if buggy firmware
      or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
      in the same physical cpu share the same node.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110415203928.1303.A69D9226@jp.fujitsu.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7d6b4670
  2. 12 Apr, 2011 1 commit
    • Jacob Pan's avatar
      x86/mrst: Fix boot crash caused by incorrect pin to irq mapping · 9d90e49d
      Jacob Pan authored
      Moorestown systems crash on boot because the secondary CPU
      clockevent (apbt1) will fail to request irq#1, which does not
      have ioapic chip in its irq_desc[] entry.
      
      Background:
      
      Moorestown platform does not have ISA bus nor legacy IRQs. It
      reuses the range of legacy IRQs for regular device interrupts.
      The routing information of early system device IRQs (timers) are
      obtained from firmware provided SFI tables. We reuse/fake MP
      configuration table to facilitate IRQ setup with IOAPIC.
      
      Maintaining a 1:1 mapping of IOAPIC pin (RTE entry) and IRQ#
      makes routing information clean and easy to understand on
      Moorestown. Though optional.
      
      This patch allows SFI timer and vRTC IRQ to be treated as ISA
      IRQ so that pin2irq mapping will be 1:1.
      
      Also fixed MP table type and use macros to clearly set MP IRQ
      entries. As a result, apbt timer and RTC interrupts on
      Moorestown are within legacy IRQ range:
      
       # cat /proc/interrupts
                  CPU0       CPU1
         0:      11249          0   IO-APIC-edge      apbt0
         1:          0      12271   IO-APIC-edge      apbt1
         8:        887          0   IO-APIC-fasteoi   dw_spi
        13:          0          0   IO-APIC-fasteoi   INTEL_MID_DMAC2
        14:          0          0   IO-APIC-fasteoi   rtc0
      
      Further discussion of this patch can be found at:
      
        https://lkml.org/lkml/2010/6/10/70Suggested-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Link: http://lkml.kernel.org/r/1302286980-21139-1-git-send-email-jacob.jun.pan@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9d90e49d
  3. 11 Apr, 2011 1 commit
  4. 10 Apr, 2011 1 commit
  5. 09 Apr, 2011 3 commits
  6. 08 Apr, 2011 4 commits
  7. 07 Apr, 2011 19 commits
  8. 06 Apr, 2011 10 commits