Commit 9c3bafaa authored by Benjamin Berg's avatar Benjamin Berg Committed by Borislav Petkov

x86/mce: Lower throttling MCE messages' priority to warning

On modern CPUs it is quite normal that the temperature limits are
reached and the CPU is throttled. In fact, often the thermal design is
not sufficient to cool the CPU at full load and limits can quickly be
reached when a burst in load happens. This will even happen with
technologies like RAPL limitting the long term power consumption of
the package.

Also, these limits are "softer", as Srinivas explains:

"CPU temperature doesn't have to hit max(TjMax) to get these warnings.
OEMs ha[ve] an ability to program a threshold where a thermal interrupt
can be generated. In some systems the offset is 20C+ (Read only value).

In recent systems, there is another offset on top of it which can be
programmed by OS, once some agent can adjust power limits dynamically.
By default this is set to low by the firmware, which I guess the
prime motivation of Benjamin to submit the patch."

So these messages do not usually indicate a hardware issue (e.g.
insufficient cooling). Log them as warnings to avoid confusion about
their severity.

 [ bp: Massage commit mesage. ]
Signed-off-by: default avatarBenjamin Berg <bberg@redhat.com>
Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
Tested-by: default avatarChristian Kellner <ckellner@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com
parent 70f0c230
...@@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level) ...@@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
/* if we just entered the thermal event */ /* if we just entered the thermal event */
if (new_event) { if (new_event) {
if (event == THERMAL_THROTTLING_EVENT) if (event == THERMAL_THROTTLING_EVENT)
pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
this_cpu, this_cpu,
level == CORE_LEVEL ? "Core" : "Package", level == CORE_LEVEL ? "Core" : "Package",
state->count); state->count);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment