• Mahesh Salgaonkar's avatar
    powerpc/powernv/mce: Print additional information about MCE error. · 50dbabe0
    Mahesh Salgaonkar authored
    Print more information about MCE error whether it is an hardware or
    software error.
    
    Some of the MCE errors can be easily categorized as hardware or
    software errors e.g. UEs are due to hardware error, where as error
    triggered due to invalid usage of tlbie is a pure software bug. But
    not all the MCE errors can be easily categorize into either software
    or hardware. There are errors like multihit errors which are usually
    result of a software bug, but in some rare cases a hardware failure
    can cause a multihit error. In past, we have seen case where after
    replacing faulty chip, multihit errors stopped occurring. Same with
    parity errors, which are usually due to faulty hardware but there are
    chances where multihit can also cause an parity error. Such errors are
    difficult to determine what really caused it. Hence this patch
    classifies MCE errors into following four categorize:
    
      1. Hardware error:
      	UE and Link timeout failure errors.
      2. Probable hardware error (some chance of software cause)
      	SLB/ERAT/TLB Parity errors.
      3. Software error
      	Invalid tlbie form.
      4. Probable software error (some chance of hardware cause)
      	SLB/ERAT/TLB Multihit errors.
    
    Sample output:
    
      MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
      MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
      MCE: CPU80: Probable Software error (some chance of hardware cause)
    Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    50dbabe0
mce.c 17.4 KB