• Andi Kleen's avatar
    x86, mce: extend struct mce user interface with more information. · 8ee08347
    Andi Kleen authored
    Experience has shown that struct mce which is used to pass an machine
    check to the user space daemon currently a few limitations.  Also some
    data which is useful to print at panic level is also missing.
    
    This patch addresses most of them. The same information is also
    printed out together with mce panic.
    
    struct mce can be painlessly extended in a compatible way, the mcelog
    user space code just ignores additional fields with a warning.
    
    - It doesn't provide a wall time timestamp. There have been a few
      complaints about that. Fix that by adding a 64bit time_t
    
    - It doesn't provide the exact CPU identification. This makes
      it awkward for mcelog to decode the event correctly, especially
      when there are variations in the supported MCE codes on different
      CPU models or when mcelog is running on a different host after a panic.
      Previously the administrator had to specify the correct CPU
      when mcelog ran on a different host, but with the more variation
      in machine checks now it's better to auto detect that.
      It's also useful for more detailed analysis of CPU events.
      Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.
    
    - Socket ID and initial APIC ID are useful to report because they
      allow to identify the failing CPU in some (not all) cases.
      This is also especially useful for the panic situation.
      This addresses one of the complaints from Thomas Gleixner earlier.
    
    - The MCG capabilities MSR needs to be reported for some advanced
      error processing in mcelog
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
    Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    8ee08347
mce.c 30.7 KB