1. 24 Sep, 2009 1 commit
    • Roland Dreier's avatar
      x86: Reduce verbosity of "TSC is reliable" message · ea01c0d7
      Roland Dreier authored
      On modern systems, the kernel prints the message
      
          Skipping synchronization checks as TSC is reliable.
      
      once for every non-boot CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          $ dmesg | grep 'TSC is reliable' | wc
               63     567    4221
      
      There's no point to doing this for every CPU, since the code is
      just checking the boot CPU anyway, so change this to a
      printk_once() to make the message appears only once.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      LKML-Reference: <adazl8l2swc.fsf@cisco.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ea01c0d7
  2. 23 Sep, 2009 2 commits
    • Ingo Molnar's avatar
      x86: mce: Use safer ways to access MCE registers · 11868a2d
      Ingo Molnar authored
      Use rdmsrl_safe() when accessing MCE registers. While in
      theory we always 'know' which ones are safe to access from
      the capability bits, there's a lot of hardware variations
      and reality might differ from theory, as it did in this case:
      
         http://bugzilla.kernel.org/show_bug.cgi?id=14204
      
      [    0.010016] mce: CPU supports 5 MCE banks
      [    0.011029] general protection fault: 0000 [#1]
      [    0.011998] last sysfs file:
      [    0.011998] Modules linked in:
      [    0.011998]
      [    0.011998] Pid: 0, comm: swapper Not tainted (2.6.31_router #1) HP Vectra
      [    0.011998] EIP: 0060:[<c100d9b9>] EFLAGS: 00010246 CPU: 0
      [    0.011998] EIP is at mce_rdmsrl+0x19/0x60
      [    0.011998] EAX: 00000000 EBX: 00000001 ECX: 00000407 EDX: 08000000
      [    0.011998] ESI: 00000000 EDI: 8c000000 EBP: 00000405 ESP: c17d5eac
      
      So WARN_ONCE() instead of crashing the box.
      
      ( also fix a number of stylistic inconsistencies in the code. )
      
      Note, we might still crash in wrmsrl() if we get that far, but
      we shouldnt if the registers are truly inaccessible.
      Reported-by: default avatarGNUtoo <GNUtoo@no-log.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <bug-14204-5438@http.bugzilla.kernel.org/>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      11868a2d
    • Ingo Molnar's avatar
  3. 22 Sep, 2009 3 commits
    • Huang Ying's avatar
      x86: mce, inject: Use real inject-msg in raise_local · 14c0abf1
      Huang Ying authored
      Current raise_local() uses a struct mce that comes from mce_write()
      as a parameter instead of the real inject-msg, so when we set
      mce.finished = 0 to clear injected MCE, the real inject stays
      valid.
      
      This will cause the remaining inject-msg affect the next injection,
      which is not desired.
      
      To fix this, real inject-msg is used in raise_local instead of the
      one on the stack.
      
      This patch is based on the diagnosis and the fixes by Dean Nelson.
      Reported-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <1253601357.15717.757.camel@yhuang-dev.sh.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14c0abf1
    • Ingo Molnar's avatar
      x86: mce: Fix thermal throttling message storm · b417c9fd
      Ingo Molnar authored
      If a system switches back and forth between hot and cold mode,
      the MCE code will print a stream of critical kernel messages.
      
      Extend the throttling code to properly notice this, by
      only printing the first hot + cold transition and omitting
      the rest up to CHECK_INTERVAL (5 minutes).
      
      This way we'll only get a single incident of:
      
       [  102.356584] CPU0: Temperature above threshold, cpu clock throttled (total events = 1)
       [  102.357000] Disabling lock debugging due to kernel taint
       [  102.369223] CPU0: Temperature/speed normal
      
      Every 5 minutes. The 'total events' count tells the number of cold/hot
      transitions detected, should overheating occur after 5 minutes again:
      
      [  402.357580] CPU0: Temperature above threshold, cpu clock throttled (total events = 24891)
      [  402.358001] CPU0: Temperature/speed normal
      [  450.704142] Machine check events logged
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b417c9fd
    • Ingo Molnar's avatar
      x86: mce: Clean up thermal throttling state tracking code · 39676840
      Ingo Molnar authored
      Instead of a mess of three separate percpu variables, consolidate
      the state into a single structure.
      
      Also clean up therm_throt_process(), use cleaner and more
      understandable variable names and a clearer logic.
      
      This, without changing the logic, makes the code more
      streamlined, more readable and smaller as well:
      
         text	   data	    bss	    dec	    hex	filename
         1487	    169	      4	   1660	    67c	therm_throt.o.before
         1432	    176	      4	   1612	    64c	therm_throt.o.after
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      39676840
  4. 21 Sep, 2009 25 commits
  5. 20 Sep, 2009 9 commits