• Benjamin Berg's avatar
    x86/mce: Lower throttling MCE messages' priority to warning · 9c3bafaa
    Benjamin Berg authored
    On modern CPUs it is quite normal that the temperature limits are
    reached and the CPU is throttled. In fact, often the thermal design is
    not sufficient to cool the CPU at full load and limits can quickly be
    reached when a burst in load happens. This will even happen with
    technologies like RAPL limitting the long term power consumption of
    the package.
    
    Also, these limits are "softer", as Srinivas explains:
    
    "CPU temperature doesn't have to hit max(TjMax) to get these warnings.
    OEMs ha[ve] an ability to program a threshold where a thermal interrupt
    can be generated. In some systems the offset is 20C+ (Read only value).
    
    In recent systems, there is another offset on top of it which can be
    programmed by OS, once some agent can adjust power limits dynamically.
    By default this is set to low by the firmware, which I guess the
    prime motivation of Benjamin to submit the patch."
    
    So these messages do not usually indicate a hardware issue (e.g.
    insufficient cooling). Log them as warnings to avoid confusion about
    their severity.
    
     [ bp: Massage commit mesage. ]
    Signed-off-by: default avatarBenjamin Berg <bberg@redhat.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
    Tested-by: default avatarChristian Kellner <ckellner@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: x86-ml <x86@kernel.org>
    Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com
    9c3bafaa
therm_throt.c 14.5 KB