[PATCH] New machine check handler for x86-64
This adds a new completely rewritten machine check handler for x86-64. The old one never worked on 2.6. The new handler has many improvements. It closely follows the Intel and AMD recommendations on MCE handlers now (the old one had many violations). It handles unrecoverable errors in user space better now - it will only kill the process now if possible instead of panicing. This one is CPU independent now - it should work on any CPU that supports the standard x86 MCA architecture. This new handler only logs fatal errors that lead to kernel panic to the console. Non fatal errors are logged race free into a new (non ring) buffer now and supplied to the user using a new character device. The old one could deadlock on console and printk locks. This also separates machine check errors from real kernel errors better. The new buffer has been also designed to be easily accessible from external debugging tools: it has a signature and could be even recovered after reboot. It is not organized as a ring buffer - this means the first errors are kept unless explicitely cleared. The new error formats can be parsed using ftp://ftp.suse.com/pub/people/ak/x86-64/mcelog.c The new character device for it can be created with mknod /dev/mcelog c 10 227 There is a new sysfs interface to configure the machine check handler. It has a "tolerant" parameter that defines the aggressiveness of the machine check: 0: always panic 1: panic if deadlock possible (e.g. MCE happened in the kernel) 2: try to avoid panic Default is 2 Despite of having more features the new handler is shorter.
Showing
This diff is collapsed.
arch/x86_64/kernel/mce.c
0 → 100644
This diff is collapsed.
include/asm-x86_64/mce.h
0 → 100644
Please register or sign in to comment