• Tony Luck's avatar
    x86/mce: Fix incorrect "Machine check from unknown source" message · 40c36e27
    Tony Luck authored
    Some injection testing resulted in the following console log:
    
      mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
      mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
      mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
      mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
      mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      Kernel panic - not syncing: Machine check from unknown source
    
    This confused everybody because the first line quite clearly shows
    that we found a logged error in "Bank 1", while the last line says
    "unknown source".
    
    The problem is that the Linux code doesn't do the right thing
    for a local machine check that results in a fatal error.
    
    It turns out that we know very early in the handler whether the
    machine check is fatal. The call to mce_no_way_out() has checked
    all the banks for the CPU that took the local machine check. If
    it says we must crash, we can do so right away with the right
    messages.
    
    We do scan all the banks again. This means that we might initially
    not see a problem, but during the second scan find something fatal.
    If this happens we print a slightly different message (so I can
    see if it actually every happens).
    
    [ bp: Remove unneeded severity assignment. ]
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Ashok Raj <ashok.raj@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: stable@vger.kernel.org # 4.2
    Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
    40c36e27
mce.c 57.2 KB