1. 24 Sep, 2009 2 commits
    • Roland Dreier's avatar
      x86: Reduce verbosity of "PAT enabled" kernel message · e23a8b6a
      Roland Dreier authored
      On modern systems, the kernel prints the message
      
          x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
      
      once for every CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          dmesg| grep 'PAT enabled' | wc
               64     704    5174
      
      There is already a BUG() if non-boot CPUs have PAT capabilities
      that don't match the boot CPU, so just print the message on the
      boot CPU. (I kept the print after the wrmsrl() that enables PAT,
      so that the log output continues to mean that the system survived
      enabling PAT on the boot CPU)
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <adavdj92sso.fsf@cisco.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e23a8b6a
    • Roland Dreier's avatar
      x86: Reduce verbosity of "TSC is reliable" message · ea01c0d7
      Roland Dreier authored
      On modern systems, the kernel prints the message
      
          Skipping synchronization checks as TSC is reliable.
      
      once for every non-boot CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          $ dmesg | grep 'TSC is reliable' | wc
               63     567    4221
      
      There's no point to doing this for every CPU, since the code is
      just checking the boot CPU anyway, so change this to a
      printk_once() to make the message appears only once.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      LKML-Reference: <adazl8l2swc.fsf@cisco.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ea01c0d7
  2. 23 Sep, 2009 2 commits
    • Ingo Molnar's avatar
      x86: mce: Use safer ways to access MCE registers · 11868a2d
      Ingo Molnar authored
      Use rdmsrl_safe() when accessing MCE registers. While in
      theory we always 'know' which ones are safe to access from
      the capability bits, there's a lot of hardware variations
      and reality might differ from theory, as it did in this case:
      
         http://bugzilla.kernel.org/show_bug.cgi?id=14204
      
      [    0.010016] mce: CPU supports 5 MCE banks
      [    0.011029] general protection fault: 0000 [#1]
      [    0.011998] last sysfs file:
      [    0.011998] Modules linked in:
      [    0.011998]
      [    0.011998] Pid: 0, comm: swapper Not tainted (2.6.31_router #1) HP Vectra
      [    0.011998] EIP: 0060:[<c100d9b9>] EFLAGS: 00010246 CPU: 0
      [    0.011998] EIP is at mce_rdmsrl+0x19/0x60
      [    0.011998] EAX: 00000000 EBX: 00000001 ECX: 00000407 EDX: 08000000
      [    0.011998] ESI: 00000000 EDI: 8c000000 EBP: 00000405 ESP: c17d5eac
      
      So WARN_ONCE() instead of crashing the box.
      
      ( also fix a number of stylistic inconsistencies in the code. )
      
      Note, we might still crash in wrmsrl() if we get that far, but
      we shouldnt if the registers are truly inaccessible.
      Reported-by: default avatarGNUtoo <GNUtoo@no-log.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <bug-14204-5438@http.bugzilla.kernel.org/>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      11868a2d
    • Ingo Molnar's avatar
  3. 22 Sep, 2009 3 commits
    • Huang Ying's avatar
      x86: mce, inject: Use real inject-msg in raise_local · 14c0abf1
      Huang Ying authored
      Current raise_local() uses a struct mce that comes from mce_write()
      as a parameter instead of the real inject-msg, so when we set
      mce.finished = 0 to clear injected MCE, the real inject stays
      valid.
      
      This will cause the remaining inject-msg affect the next injection,
      which is not desired.
      
      To fix this, real inject-msg is used in raise_local instead of the
      one on the stack.
      
      This patch is based on the diagnosis and the fixes by Dean Nelson.
      Reported-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <1253601357.15717.757.camel@yhuang-dev.sh.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14c0abf1
    • Ingo Molnar's avatar
      x86: mce: Fix thermal throttling message storm · b417c9fd
      Ingo Molnar authored
      If a system switches back and forth between hot and cold mode,
      the MCE code will print a stream of critical kernel messages.
      
      Extend the throttling code to properly notice this, by
      only printing the first hot + cold transition and omitting
      the rest up to CHECK_INTERVAL (5 minutes).
      
      This way we'll only get a single incident of:
      
       [  102.356584] CPU0: Temperature above threshold, cpu clock throttled (total events = 1)
       [  102.357000] Disabling lock debugging due to kernel taint
       [  102.369223] CPU0: Temperature/speed normal
      
      Every 5 minutes. The 'total events' count tells the number of cold/hot
      transitions detected, should overheating occur after 5 minutes again:
      
      [  402.357580] CPU0: Temperature above threshold, cpu clock throttled (total events = 24891)
      [  402.358001] CPU0: Temperature/speed normal
      [  450.704142] Machine check events logged
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b417c9fd
    • Ingo Molnar's avatar
      x86: mce: Clean up thermal throttling state tracking code · 39676840
      Ingo Molnar authored
      Instead of a mess of three separate percpu variables, consolidate
      the state into a single structure.
      
      Also clean up therm_throt_process(), use cleaner and more
      understandable variable names and a clearer logic.
      
      This, without changing the logic, makes the code more
      streamlined, more readable and smaller as well:
      
         text	   data	    bss	    dec	    hex	filename
         1487	    169	      4	   1660	    67c	therm_throt.o.before
         1432	    176	      4	   1612	    64c	therm_throt.o.after
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      39676840
  4. 21 Sep, 2009 25 commits
  5. 20 Sep, 2009 8 commits