Commit a2ee2981 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
  x86, mce: Add boot options for corrected errors
  x86, mce: Fix mce printing
  x86, mce: fix for mce counters
  x86, mce: support action-optional machine checks
  x86, mce: define MCE_VECTOR
  x86, mce: rename mce_notify_user to mce_notify_irq
  x86: fix panic with interrupts off (needed for MCE)
  x86, mce: export MCE severities coverage via debugfs
  x86, mce: implement new status bits
  x86, mce: print header/footer only once for multiple MCEs
  x86, mce: default to panic timeout for machine checks
  x86, mce: improve mce_get_rip
  x86, mce: make non Monarch panic message "Fatal machine check" too
  x86, mce: switch x86 machine check handler to Monarch election.
  x86, mce: implement panic synchronization
  x86, mce: implement bootstrapping for machine check wakeups
  x86, mce: check early in exception handler if panic is needed
  x86, mce: add table driven machine check grading
  x86, mce: remove TSC print heuristic
  x86, mce: log corrected errors when panicing
  ...
parents 7603ef03 0d595972
......@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
o oprofile 0.9 # oprofiled --version
o udev 081 # udevinfo -V
o grub 0.93 # grub --version
o mcelog 0.6
Kernel compilation
==================
......@@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.
mcelog
------
In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
as a regular cronjob similar to the x86-64 kernel to process and log
machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
events are errors reported by the CPU. Processing them is strongly encouraged.
All x86-64 kernels since 2.6.4 require the mcelog utility to
process machine checks.
Getting updated software
========================
......@@ -365,6 +376,10 @@ FUSE
----
o <http://sourceforge.net/projects/fuse>
mcelog
------
o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
Networking
**********
......
......@@ -437,3 +437,13 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
driver but this caused driver conflicts.
Who: Jean Delvare <khali@linux-fr.org>
Krzysztof Helt <krzysztof.h1@wp.pl>
----------------------------
What: CONFIG_X86_OLD_MCE
When: 2.6.32
Why: Remove the old legacy 32bit machine check code. This has been
superseded by the newer machine check code from the 64bit port,
but the old version has been kept around for easier testing. Note this
doesn't impact the old P5 and WinChip machine check handlers.
Who: Andi Kleen <andi@firstfloor.org>
......@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here.
Machine check
mce=off disable machine check
mce=bootlog Enable logging of machine checks left over from booting.
Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
mce=off
Disable machine check
mce=no_cmci
Disable CMCI(Corrected Machine Check Interrupt) that
Intel processor supports. Usually this disablement is
not recommended, but it might be handy if your hardware
is misbehaving.
Note that you'll get more problems without CMCI than with
due to the shared banks, i.e. you might get duplicated
error logs.
mce=dont_log_ce
Don't make logs for corrected errors. All events reported
as corrected are silently cleared by OS.
This option will be useful if you have no interest in any
of corrected errors.
mce=ignore_ce
Disable features for corrected errors, e.g. polling timer
and CMCI. All events reported as corrected are not cleared
by OS and remained in its error banks.
Usually this disablement is not recommended, however if
there is an agent checking/clearing corrected errors
(e.g. BIOS or hardware monitoring applications), conflicting
with OS's error handling, and you cannot deactivate the agent,
then this option will be a help.
mce=bootlog
Enable logging of machine checks left over from booting.
Disabled by default on AMD because some BIOS leave bogus ones.
If your BIOS doesn't do that it's a good idea to enable though
to make sure you log even machine check events that result
in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
mce=tolerancelevel (number)
mce=tolerancelevel[,monarchtimeout] (number,number)
tolerance levels:
0: always panic on uncorrected errors, log corrected errors
1: panic or SIGBUS on uncorrected errors, log corrected errors
2: SIGBUS or log uncorrected errors, log corrected errors
3: never panic or SIGBUS, log all errors (for testing only)
Default is 1
Can be also set using sysfs which is preferable.
monarchtimeout:
Sets the time in us to wait for other CPUs on machine checks. 0
to disable.
nomce (for compatibility with i386): same as mce=off
......
......@@ -41,7 +41,9 @@ check_interval
the polling interval. When the poller stops finding MCEs, it
triggers an exponential backoff (poll less often) on the polling
interval. The check_interval variable is both the initial and
maximum polling interval.
maximum polling interval. 0 means no polling for corrected machine
check errors (but some corrected errors might be still reported
in other ways)
tolerant
Tolerance level. When a machine check exception occurs for a non
......@@ -67,6 +69,10 @@ trigger
Program to run when a machine check event is detected.
This is an alternative to running mcelog regularly from cron
and allows to detect events faster.
monarch_timeout
How long to wait for the other CPUs to machine check too on a
exception. 0 to disable waiting for other CPUs.
Unit: us
TBD document entries for AMD threshold interrupt configuration
......
......@@ -789,10 +789,26 @@ config X86_MCE
to disable it. MCE support simply ignores non-MCE processors like
the 386 and 486, so nearly everyone can say Y here.
config X86_OLD_MCE
depends on X86_32 && X86_MCE
bool "Use legacy machine check code (will go away)"
default n
select X86_ANCIENT_MCE
---help---
Use the old i386 machine check code. This is merely intended for
testing in a transition period. Try this if you run into any machine
check related software problems, but report the problem to
linux-kernel. When in doubt say no.
config X86_NEW_MCE
depends on X86_MCE
bool
default y if (!X86_OLD_MCE && X86_32) || X86_64
config X86_MCE_INTEL
def_bool y
prompt "Intel MCE features"
depends on X86_64 && X86_MCE && X86_LOCAL_APIC
depends on X86_NEW_MCE && X86_LOCAL_APIC
---help---
Additional support for intel specific MCE features such as
the thermal monitor.
......@@ -800,19 +816,36 @@ config X86_MCE_INTEL
config X86_MCE_AMD
def_bool y
prompt "AMD MCE features"
depends on X86_64 && X86_MCE && X86_LOCAL_APIC
depends on X86_NEW_MCE && X86_LOCAL_APIC
---help---
Additional support for AMD specific MCE features such as
the DRAM Error Threshold.
config X86_ANCIENT_MCE
def_bool n
depends on X86_32
prompt "Support for old Pentium 5 / WinChip machine checks"
---help---
Include support for machine check handling on old Pentium 5 or WinChip
systems. These typically need to be enabled explicitely on the command
line.
config X86_MCE_THRESHOLD
depends on X86_MCE_AMD || X86_MCE_INTEL
bool
default y
config X86_MCE_INJECT
depends on X86_NEW_MCE
tristate "Machine check injector support"
---help---
Provide support for injecting machine checks for testing purposes.
If you don't know what a machine check is and you don't do kernel
QA it is safe to say n.
config X86_MCE_NONFATAL
tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
depends on X86_32 && X86_MCE
depends on X86_OLD_MCE
---help---
Enabling this feature starts a timer that triggers every 5 seconds which
will look at the machine check registers to see if anything happened.
......@@ -825,11 +858,15 @@ config X86_MCE_NONFATAL
config X86_MCE_P4THERMAL
bool "check for P4 thermal throttling interrupt."
depends on X86_32 && X86_MCE && (X86_UP_APIC || SMP)
depends on X86_OLD_MCE && X86_MCE && (X86_UP_APIC || SMP)
---help---
Enabling this feature will cause a message to be printed when the P4
enters thermal throttling.
config X86_THERMAL_VECTOR
def_bool y
depends on X86_MCE_P4THERMAL || X86_MCE_INTEL
config VM86
bool "Enable VM86 support" if EMBEDDED
default y
......
......@@ -14,6 +14,7 @@ BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
BUILD_INTERRUPT3(invalidate_interrupt0,INVALIDATE_TLB_VECTOR_START+0,
smp_invalidate_interrupt)
......@@ -52,8 +53,16 @@ BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR)
BUILD_INTERRUPT(perf_pending_interrupt, LOCAL_PENDING_VECTOR)
#endif
#ifdef CONFIG_X86_MCE_P4THERMAL
#ifdef CONFIG_X86_THERMAL_VECTOR
BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
#endif
#ifdef CONFIG_X86_NEW_MCE
BUILD_INTERRUPT(mce_self_interrupt,MCE_SELF_VECTOR)
#endif
#endif
......@@ -22,7 +22,7 @@ typedef struct {
#endif
#ifdef CONFIG_X86_MCE
unsigned int irq_thermal_count;
# ifdef CONFIG_X86_64
# ifdef CONFIG_X86_MCE_THRESHOLD
unsigned int irq_threshold_count;
# endif
#endif
......
......@@ -34,6 +34,7 @@ extern void perf_pending_interrupt(void);
extern void spurious_interrupt(void);
extern void thermal_interrupt(void);
extern void reschedule_interrupt(void);
extern void mce_self_interrupt(void);
extern void invalidate_interrupt(void);
extern void invalidate_interrupt0(void);
......@@ -46,6 +47,7 @@ extern void invalidate_interrupt6(void);
extern void invalidate_interrupt7(void);
extern void irq_move_cleanup_interrupt(void);
extern void reboot_interrupt(void);
extern void threshold_interrupt(void);
extern void call_function_interrupt(void);
......
......@@ -25,6 +25,7 @@
*/
#define NMI_VECTOR 0x02
#define MCE_VECTOR 0x12
/*
* IDT vectors usable for external interrupt sources start
......@@ -87,13 +88,8 @@
#define CALL_FUNCTION_VECTOR 0xfc
#define CALL_FUNCTION_SINGLE_VECTOR 0xfb
#define THERMAL_APIC_VECTOR 0xfa
#ifdef CONFIG_X86_32
/* 0xf8 - 0xf9 : free */
#else
# define THRESHOLD_APIC_VECTOR 0xf9
# define UV_BAU_MESSAGE 0xf8
#endif
#define THRESHOLD_APIC_VECTOR 0xf9
#define REBOOT_VECTOR 0xf8
/* f0-f7 used for spreading out TLB flushes: */
#define INVALIDATE_TLB_VECTOR_END 0xf7
......@@ -117,6 +113,13 @@
*/
#define LOCAL_PENDING_VECTOR 0xec
#define UV_BAU_MESSAGE 0xec
/*
* Self IPI vector for machine checks
*/
#define MCE_SELF_VECTOR 0xeb
/*
* First APIC vector available to drivers: (vectors 0x30-0xee) we
* start at 0x31(0x41) to spread out vectors evenly between priority
......
#ifndef _ASM_X86_MCE_H
#define _ASM_X86_MCE_H
#ifdef __x86_64__
#include <linux/types.h>
#include <asm/ioctls.h>
......@@ -10,21 +8,35 @@
* Machine Check support for x86
*/
#define MCG_CTL_P (1UL<<8) /* MCG_CAP register available */
#define MCG_BANKCNT_MASK 0xff /* Number of Banks */
#define MCG_CTL_P (1ULL<<8) /* MCG_CAP register available */
#define MCG_EXT_P (1ULL<<9) /* Extended registers available */
#define MCG_CMCI_P (1ULL<<10) /* CMCI supported */
#define MCG_STATUS_RIPV (1UL<<0) /* restart ip valid */
#define MCG_STATUS_EIPV (1UL<<1) /* ip points to correct instruction */
#define MCG_STATUS_MCIP (1UL<<2) /* machine check in progress */
#define MCI_STATUS_VAL (1UL<<63) /* valid error */
#define MCI_STATUS_OVER (1UL<<62) /* previous errors lost */
#define MCI_STATUS_UC (1UL<<61) /* uncorrected error */
#define MCI_STATUS_EN (1UL<<60) /* error enabled */
#define MCI_STATUS_MISCV (1UL<<59) /* misc error reg. valid */
#define MCI_STATUS_ADDRV (1UL<<58) /* addr reg. valid */
#define MCI_STATUS_PCC (1UL<<57) /* processor context corrupt */
#define MCG_EXT_CNT_MASK 0xff0000 /* Number of Extended registers */
#define MCG_EXT_CNT_SHIFT 16
#define MCG_EXT_CNT(c) (((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT)
#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */
#define MCG_STATUS_RIPV (1ULL<<0) /* restart ip valid */
#define MCG_STATUS_EIPV (1ULL<<1) /* ip points to correct instruction */
#define MCG_STATUS_MCIP (1ULL<<2) /* machine check in progress */
#define MCI_STATUS_VAL (1ULL<<63) /* valid error */
#define MCI_STATUS_OVER (1ULL<<62) /* previous errors lost */
#define MCI_STATUS_UC (1ULL<<61) /* uncorrected error */
#define MCI_STATUS_EN (1ULL<<60) /* error enabled */
#define MCI_STATUS_MISCV (1ULL<<59) /* misc error reg. valid */
#define MCI_STATUS_ADDRV (1ULL<<58) /* addr reg. valid */
#define MCI_STATUS_PCC (1ULL<<57) /* processor context corrupt */
#define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */
#define MCI_STATUS_AR (1ULL<<55) /* Action required */
/* MISC register defines */
#define MCM_ADDR_SEGOFF 0 /* segment offset */
#define MCM_ADDR_LINEAR 1 /* linear address */
#define MCM_ADDR_PHYS 2 /* physical address */
#define MCM_ADDR_MEM 3 /* memory address */
#define MCM_ADDR_GENERIC 7 /* generic */
/* Fields are zero when not available */
struct mce {
......@@ -34,13 +46,19 @@ struct mce {
__u64 mcgstatus;
__u64 ip;
__u64 tsc; /* cpu time stamp counter */
__u64 res1; /* for future extension */
__u64 res2; /* dito. */
__u64 time; /* wall time_t when error was detected */
__u8 cpuvendor; /* cpu vendor as encoded in system.h */
__u8 pad1;
__u16 pad2;
__u32 cpuid; /* CPUID 1 EAX */
__u8 cs; /* code segment */
__u8 bank; /* machine check bank */
__u8 cpu; /* cpu that raised the error */
__u8 cpu; /* cpu number; obsolete; use extcpu now */
__u8 finished; /* entry is valid */
__u32 pad;
__u32 extcpu; /* linux cpu number that detected the error */
__u32 socketid; /* CPU socket ID */
__u32 apicid; /* CPU initial apic ID */
__u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */
};
/*
......@@ -57,7 +75,7 @@ struct mce_log {
unsigned len; /* = MCE_LOG_LEN */
unsigned next;
unsigned flags;
unsigned pad0;
unsigned recordlen; /* length of struct mce */
struct mce entry[MCE_LOG_LEN];
};
......@@ -82,19 +100,16 @@ struct mce_log {
#define K8_MCE_THRESHOLD_BANK_5 (MCE_THRESHOLD_BASE + 5 * 9)
#define K8_MCE_THRESHOLD_DRAM_ECC (MCE_THRESHOLD_BANK_4 + 0)
#endif /* __x86_64__ */
#ifdef __KERNEL__
#ifdef CONFIG_X86_32
extern int mce_disabled;
#else /* CONFIG_X86_32 */
#include <asm/atomic.h>
#include <linux/percpu.h>
void mce_setup(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct sys_device, device_mce);
DECLARE_PER_CPU(struct sys_device, mce_dev);
extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
/*
......@@ -104,6 +119,8 @@ extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
#define MAX_NR_BANKS (MCE_EXTENDED_BANK - 1)
#ifdef CONFIG_X86_MCE_INTEL
extern int mce_cmci_disabled;
extern int mce_ignore_ce;
void mce_intel_feature_init(struct cpuinfo_x86 *c);
void cmci_clear(void);
void cmci_reenable(void);
......@@ -123,13 +140,16 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
#endif
extern int mce_available(struct cpuinfo_x86 *c);
int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
void mce_log_therm_throt_event(__u64 status);
extern atomic_t mce_entry;
extern void do_machine_check(struct pt_regs *, long);
void do_machine_check(struct pt_regs *, long);
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
......@@ -139,14 +159,16 @@ enum mcp_flags {
MCP_UC = (1 << 1), /* log uncorrected errors */
MCP_DONTLOG = (1 << 2), /* only clear, don't log */
};
extern void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
extern int mce_notify_user(void);
int mce_notify_irq(void);
void mce_notify_process(void);
#endif /* !CONFIG_X86_32 */
DECLARE_PER_CPU(struct mce, injectm);
extern struct file_operations mce_chrdev_ops;
#ifdef CONFIG_X86_MCE
extern void mcheck_init(struct cpuinfo_x86 *c);
void mcheck_init(struct cpuinfo_x86 *c);
#else
#define mcheck_init(c) do { } while (0)
#endif
......
......@@ -207,7 +207,14 @@
#define MSR_IA32_THERM_CONTROL 0x0000019a
#define MSR_IA32_THERM_INTERRUPT 0x0000019b
#define THERM_INT_LOW_ENABLE (1 << 0)
#define THERM_INT_HIGH_ENABLE (1 << 1)
#define MSR_IA32_THERM_STATUS 0x0000019c
#define THERM_STATUS_PROCHOT (1 << 0)
#define MSR_IA32_MISC_ENABLE 0x000001a0
/* MISC_ENABLE bits: architectural */
......
......@@ -899,7 +899,7 @@ void clear_local_APIC(void)
}
/* lets not touch this if we didn't frob it */
#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL)
#ifdef CONFIG_X86_THERMAL_VECTOR
if (maxlvt >= 5) {
v = apic_read(APIC_LVTTHMR);
apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);
......@@ -2017,7 +2017,7 @@ static int lapic_suspend(struct sys_device *dev, pm_message_t state)
apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR);
apic_pm_state.apic_tmict = apic_read(APIC_TMICT);
apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL)
#ifdef CONFIG_X86_THERMAL_VECTOR
if (maxlvt >= 5)
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
#endif
......
......@@ -66,7 +66,7 @@ static inline unsigned int get_nmi_count(int cpu)
static inline int mce_in_progress(void)
{
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE)
#if defined(CONFIG_X86_NEW_MCE)
return atomic_read(&mce_entry) > 0;
#endif
return 0;
......
obj-y = mce_$(BITS).o therm_throt.o
obj-y = mce.o therm_throt.o
obj-$(CONFIG_X86_32) += k7.o p4.o p5.o p6.o winchip.o
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel_64.o
obj-$(CONFIG_X86_NEW_MCE) += mce-severity.o
obj-$(CONFIG_X86_OLD_MCE) += k7.o p4.o p6.o
obj-$(CONFIG_X86_ANCIENT_MCE) += winchip.o p5.o
obj-$(CONFIG_X86_MCE_P4THERMAL) += mce_intel.o
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel_64.o mce_intel.o
obj-$(CONFIG_X86_MCE_AMD) += mce_amd_64.o
obj-$(CONFIG_X86_MCE_NONFATAL) += non-fatal.o
obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
obj-$(CONFIG_X86_MCE_INJECT) += mce-inject.o
......@@ -2,11 +2,10 @@
* Athlon specific Machine Check Exception Reporting
* (C) Copyright 2002 Dave Jones <davej@redhat.com>
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/processor.h>
......@@ -15,12 +14,12 @@
#include "mce.h"
/* Machine Check Handler For AMD Athlon/Duron */
/* Machine Check Handler For AMD Athlon/Duron: */
static void k7_machine_check(struct pt_regs *regs, long error_code)
{
int recover = 1;
u32 alow, ahigh, high, low;
u32 mcgstl, mcgsth;
int recover = 1;
int i;
rdmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
......@@ -32,15 +31,19 @@ static void k7_machine_check(struct pt_regs *regs, long error_code)
for (i = 1; i < nr_mce_banks; i++) {
rdmsr(MSR_IA32_MC0_STATUS+i*4, low, high);
if (high&(1<<31)) {
if (high & (1<<31)) {
char misc[20];
char addr[24];
misc[0] = addr[0] = '\0';
misc[0] = '\0';
addr[0] = '\0';
if (high & (1<<29))
recover |= 1;
if (high & (1<<25))
recover |= 2;
high &= ~(1<<31);
if (high & (1<<27)) {
rdmsr(MSR_IA32_MC0_MISC+i*4, alow, ahigh);
snprintf(misc, 20, "[%08x%08x]", ahigh, alow);
......@@ -49,27 +52,31 @@ static void k7_machine_check(struct pt_regs *regs, long error_code)
rdmsr(MSR_IA32_MC0_ADDR+i*4, alow, ahigh);
snprintf(addr, 24, " at %08x%08x", ahigh, alow);
}
printk(KERN_EMERG "CPU %d: Bank %d: %08x%08x%s%s\n",
smp_processor_id(), i, high, low, misc, addr);
/* Clear it */
/* Clear it: */
wrmsr(MSR_IA32_MC0_STATUS+i*4, 0UL, 0UL);
/* Serialize */
/* Serialize: */
wmb();
add_taint(TAINT_MACHINE_CHECK);
}
}
if (recover&2)
if (recover & 2)
panic("CPU context corrupt");
if (recover&1)
if (recover & 1)
panic("Unable to continue");
printk(KERN_EMERG "Attempting to continue.\n");
mcgstl &= ~(1<<2);
wrmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
}
/* AMD K7 machine check is Intel like */
/* AMD K7 machine check is Intel like: */
void amd_mcheck_init(struct cpuinfo_x86 *c)
{
u32 l, h;
......@@ -79,21 +86,26 @@ void amd_mcheck_init(struct cpuinfo_x86 *c)
return;
machine_check_vector = k7_machine_check;
/* Make sure the vector pointer is visible before we enable MCEs: */
wmb();
printk(KERN_INFO "Intel machine check architecture supported.\n");
rdmsr(MSR_IA32_MCG_CAP, l, h);
if (l & (1<<8)) /* Control register present ? */
wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
nr_mce_banks = l & 0xff;
/* Clear status for MC index 0 separately, we don't touch CTL,
* as some K7 Athlons cause spurious MCEs when its enabled. */
/*
* Clear status for MC index 0 separately, we don't touch CTL,
* as some K7 Athlons cause spurious MCEs when its enabled:
*/
if (boot_cpu_data.x86 == 6) {
wrmsr(MSR_IA32_MC0_STATUS, 0x0, 0x0);
i = 1;
} else
i = 0;
for (; i < nr_mce_banks; i++) {
wrmsr(MSR_IA32_MC0_CTL+4*i, 0xffffffff, 0xffffffff);
wrmsr(MSR_IA32_MC0_STATUS+4*i, 0x0, 0x0);
......
/*
* Machine check injection support.
* Copyright 2008 Intel Corporation.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; version 2
* of the License.
*
* Authors:
* Andi Kleen
* Ying Huang
*/
#include <linux/uaccess.h>
#include <linux/module.h>
#include <linux/timer.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/fs.h>
#include <linux/smp.h>
#include <asm/mce.h>
/* Update fake mce registers on current CPU. */
static void inject_mce(struct mce *m)
{
struct mce *i = &per_cpu(injectm, m->extcpu);
/* Make sure noone reads partially written injectm */
i->finished = 0;
mb();
m->finished = 0;
/* First set the fields after finished */
i->extcpu = m->extcpu;
mb();
/* Now write record in order, finished last (except above) */
memcpy(i, m, sizeof(struct mce));
/* Finally activate it */
mb();
i->finished = 1;
}
struct delayed_mce {
struct timer_list timer;
struct mce m;
};
/* Inject mce on current CPU */
static void raise_mce(unsigned long data)
{
struct delayed_mce *dm = (struct delayed_mce *)data;
struct mce *m = &dm->m;
int cpu = m->extcpu;
inject_mce(m);
if (m->status & MCI_STATUS_UC) {
struct pt_regs regs;
memset(&regs, 0, sizeof(struct pt_regs));
regs.ip = m->ip;
regs.cs = m->cs;
printk(KERN_INFO "Triggering MCE exception on CPU %d\n", cpu);
do_machine_check(&regs, 0);
printk(KERN_INFO "MCE exception done on CPU %d\n", cpu);
} else {
mce_banks_t b;
memset(&b, 0xff, sizeof(mce_banks_t));
printk(KERN_INFO "Starting machine check poll CPU %d\n", cpu);
machine_check_poll(0, &b);
mce_notify_irq();
printk(KERN_INFO "Finished machine check poll on CPU %d\n",
cpu);
}
kfree(dm);
}
/* Error injection interface */
static ssize_t mce_write(struct file *filp, const char __user *ubuf,
size_t usize, loff_t *off)
{
struct delayed_mce *dm;
struct mce m;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
/*
* There are some cases where real MSR reads could slip
* through.
*/
if (!boot_cpu_has(X86_FEATURE_MCE) || !boot_cpu_has(X86_FEATURE_MCA))
return -EIO;
if ((unsigned long)usize > sizeof(struct mce))
usize = sizeof(struct mce);
if (copy_from_user(&m, ubuf, usize))
return -EFAULT;
if (m.extcpu >= num_possible_cpus() || !cpu_online(m.extcpu))
return -EINVAL;
dm = kmalloc(sizeof(struct delayed_mce), GFP_KERNEL);
if (!dm)
return -ENOMEM;
/*
* Need to give user space some time to set everything up,
* so do it a jiffie or two later everywhere.
* Should we use a hrtimer here for better synchronization?
*/
memcpy(&dm->m, &m, sizeof(struct mce));
setup_timer(&dm->timer, raise_mce, (unsigned long)dm);
dm->timer.expires = jiffies + 2;
add_timer_on(&dm->timer, m.extcpu);
return usize;
}
static int inject_init(void)
{
printk(KERN_INFO "Machine check injector initialized\n");
mce_chrdev_ops.write = mce_write;
return 0;
}
module_init(inject_init);
/*
* Cannot tolerate unloading currently because we cannot
* guarantee all openers of mce_chrdev will get a reference to us.
*/
MODULE_LICENSE("GPL");
#include <asm/mce.h>
enum severity_level {
MCE_NO_SEVERITY,
MCE_KEEP_SEVERITY,
MCE_SOME_SEVERITY,
MCE_AO_SEVERITY,
MCE_UC_SEVERITY,
MCE_AR_SEVERITY,
MCE_PANIC_SEVERITY,
};
int mce_severity(struct mce *a, int tolerant, char **msg);
extern int mce_ser;
/*
* MCE grading rules.
* Copyright 2008, 2009 Intel Corporation.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; version 2
* of the License.
*
* Author: Andi Kleen
*/
#include <linux/kernel.h>
#include <linux/seq_file.h>
#include <linux/init.h>
#include <linux/debugfs.h>
#include <asm/mce.h>
#include "mce-internal.h"
/*
* Grade an mce by severity. In general the most severe ones are processed
* first. Since there are quite a lot of combinations test the bits in a
* table-driven way. The rules are simply processed in order, first
* match wins.
*
* Note this is only used for machine check exceptions, the corrected
* errors use much simpler rules. The exceptions still check for the corrected
* errors, but only to leave them alone for the CMCI handler (except for
* panic situations)
*/
enum context { IN_KERNEL = 1, IN_USER = 2 };
enum ser { SER_REQUIRED = 1, NO_SER = 2 };
static struct severity {
u64 mask;
u64 result;
unsigned char sev;
unsigned char mcgmask;
unsigned char mcgres;
unsigned char ser;
unsigned char context;
unsigned char covered;
char *msg;
} severities[] = {
#define KERNEL .context = IN_KERNEL
#define USER .context = IN_USER
#define SER .ser = SER_REQUIRED
#define NOSER .ser = NO_SER
#define SEV(s) .sev = MCE_ ## s ## _SEVERITY
#define BITCLR(x, s, m, r...) { .mask = x, .result = 0, SEV(s), .msg = m, ## r }
#define BITSET(x, s, m, r...) { .mask = x, .result = x, SEV(s), .msg = m, ## r }
#define MCGMASK(x, res, s, m, r...) \
{ .mcgmask = x, .mcgres = res, SEV(s), .msg = m, ## r }
#define MASK(x, y, s, m, r...) \
{ .mask = x, .result = y, SEV(s), .msg = m, ## r }
#define MCI_UC_S (MCI_STATUS_UC|MCI_STATUS_S)
#define MCI_UC_SAR (MCI_STATUS_UC|MCI_STATUS_S|MCI_STATUS_AR)
#define MCACOD 0xffff
BITCLR(MCI_STATUS_VAL, NO, "Invalid"),
BITCLR(MCI_STATUS_EN, NO, "Not enabled"),
BITSET(MCI_STATUS_PCC, PANIC, "Processor context corrupt"),
/* When MCIP is not set something is very confused */
MCGMASK(MCG_STATUS_MCIP, 0, PANIC, "MCIP not set in MCA handler"),
/* Neither return not error IP -- no chance to recover -> PANIC */
MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0, PANIC,
"Neither restart nor error IP"),
MCGMASK(MCG_STATUS_RIPV, 0, PANIC, "In kernel and no restart IP",
KERNEL),
BITCLR(MCI_STATUS_UC, KEEP, "Corrected error", NOSER),
MASK(MCI_STATUS_OVER|MCI_STATUS_UC|MCI_STATUS_EN, MCI_STATUS_UC, SOME,
"Spurious not enabled", SER),
/* ignore OVER for UCNA */
MASK(MCI_UC_SAR, MCI_STATUS_UC, KEEP,
"Uncorrected no action required", SER),
MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_STATUS_UC|MCI_STATUS_AR, PANIC,
"Illegal combination (UCNA with AR=1)", SER),
MASK(MCI_STATUS_S, 0, KEEP, "Non signalled machine check", SER),
/* AR add known MCACODs here */
MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_STATUS_OVER|MCI_UC_SAR, PANIC,
"Action required with lost events", SER),
MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_SAR, PANIC,
"Action required; unknown MCACOD", SER),
/* known AO MCACODs: */
MASK(MCI_UC_SAR|MCI_STATUS_OVER|0xfff0, MCI_UC_S|0xc0, AO,
"Action optional: memory scrubbing error", SER),
MASK(MCI_UC_SAR|MCI_STATUS_OVER|MCACOD, MCI_UC_S|0x17a, AO,
"Action optional: last level cache writeback error", SER),
MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_S, SOME,
"Action optional unknown MCACOD", SER),
MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_S|MCI_STATUS_OVER, SOME,
"Action optional with lost events", SER),
BITSET(MCI_STATUS_UC|MCI_STATUS_OVER, PANIC, "Overflowed uncorrected"),
BITSET(MCI_STATUS_UC, UC, "Uncorrected"),
BITSET(0, SOME, "No match") /* always matches. keep at end */
};
/*
* If the EIPV bit is set, it means the saved IP is the
* instruction which caused the MCE.
*/
static int error_context(struct mce *m)
{
if (m->mcgstatus & MCG_STATUS_EIPV)
return (m->ip && (m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
/* Unknown, assume kernel */
return IN_KERNEL;
}
int mce_severity(struct mce *a, int tolerant, char **msg)
{
enum context ctx = error_context(a);
struct severity *s;
for (s = severities;; s++) {
if ((a->status & s->mask) != s->result)
continue;
if ((a->mcgstatus & s->mcgmask) != s->mcgres)
continue;
if (s->ser == SER_REQUIRED && !mce_ser)
continue;
if (s->ser == NO_SER && mce_ser)
continue;
if (s->context && ctx != s->context)
continue;
if (msg)
*msg = s->msg;
s->covered = 1;
if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
if (panic_on_oops || tolerant < 1)
return MCE_PANIC_SEVERITY;
}
return s->sev;
}
}
static void *s_start(struct seq_file *f, loff_t *pos)
{
if (*pos >= ARRAY_SIZE(severities))
return NULL;
return &severities[*pos];
}
static void *s_next(struct seq_file *f, void *data, loff_t *pos)
{
if (++(*pos) >= ARRAY_SIZE(severities))
return NULL;
return &severities[*pos];
}
static void s_stop(struct seq_file *f, void *data)
{
}
static int s_show(struct seq_file *f, void *data)
{
struct severity *ser = data;
seq_printf(f, "%d\t%s\n", ser->covered, ser->msg);
return 0;
}
static const struct seq_operations severities_seq_ops = {
.start = s_start,
.next = s_next,
.stop = s_stop,
.show = s_show,
};
static int severities_coverage_open(struct inode *inode, struct file *file)
{
return seq_open(file, &severities_seq_ops);
}
static ssize_t severities_coverage_write(struct file *file,
const char __user *ubuf,
size_t count, loff_t *ppos)
{
int i;
for (i = 0; i < ARRAY_SIZE(severities); i++)
severities[i].covered = 0;
return count;
}
static const struct file_operations severities_coverage_fops = {
.open = severities_coverage_open,
.release = seq_release,
.read = seq_read,
.write = severities_coverage_write,
};
static int __init severities_debugfs_init(void)
{
struct dentry *dmce = NULL, *fseverities_coverage = NULL;
dmce = debugfs_create_dir("mce", NULL);
if (dmce == NULL)
goto err_out;
fseverities_coverage = debugfs_create_file("severities-coverage",
0444, dmce, NULL,
&severities_coverage_fops);
if (fseverities_coverage == NULL)
goto err_out;
return 0;
err_out:
if (fseverities_coverage)
debugfs_remove(fseverities_coverage);
if (dmce)
debugfs_remove(dmce);
return -ENOMEM;
}
late_initcall(severities_debugfs_init);
/*
* Machine check handler.
*
* K8 parts Copyright 2002,2003 Andi Kleen, SuSE Labs.
* Rest from unknown author(s).
* 2004 Andi Kleen. Rewrote most of it.
* Copyright 2008 Intel Corporation
* Author: Andi Kleen
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/thread_info.h>
#include <linux/capability.h>
#include <linux/miscdevice.h>
#include <linux/interrupt.h>
#include <linux/ratelimit.h>
#include <linux/kallsyms.h>
#include <linux/rcupdate.h>
#include <linux/kobject.h>
#include <linux/uaccess.h>
#include <linux/kdebug.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/smp_lock.h>
#include <linux/percpu.h>
#include <linux/string.h>
#include <linux/rcupdate.h>
#include <linux/kallsyms.h>
#include <linux/sysdev.h>
#include <linux/miscdevice.h>
#include <linux/fs.h>
#include <linux/capability.h>
#include <linux/cpu.h>
#include <linux/percpu.h>
#include <linux/poll.h>
#include <linux/thread_info.h>
#include <linux/delay.h>
#include <linux/ctype.h>
#include <linux/kmod.h>
#include <linux/kdebug.h>
#include <linux/kobject.h>
#include <linux/sched.h>
#include <linux/sysfs.h>
#include <linux/ratelimit.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/kmod.h>
#include <linux/poll.h>
#include <linux/nmi.h>
#include <linux/cpu.h>
#include <linux/smp.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include <asm/processor.h>
#include <asm/msr.h>
#include <asm/mce.h>
#include <asm/uaccess.h>
#include <asm/smp.h>
#include <asm/hw_irq.h>
#include <asm/apic.h>
#include <asm/idle.h>
#include <asm/ipi.h>
#include <asm/mce.h>
#include <asm/msr.h>
#include "mce-internal.h"
#include "mce.h"
/* Handle unconfigured int18 (should never happen) */
static void unexpected_machine_check(struct pt_regs *regs, long error_code)
{
printk(KERN_ERR "CPU#%d: Unexpected int18 (Machine Check).\n",
smp_processor_id());
}
/* Call the installed machine check handler for this CPU setup. */
void (*machine_check_vector)(struct pt_regs *, long error_code) =
unexpected_machine_check;
int mce_disabled;
#ifdef CONFIG_X86_NEW_MCE
#define MISC_MCELOG_MINOR 227
#define SPINUNIT 100 /* 100ns */
atomic_t mce_entry;
static int mce_dont_init;
DEFINE_PER_CPU(unsigned, mce_exception_count);
/*
* Tolerant levels:
......@@ -55,26 +82,55 @@ static u64 *bank;
static unsigned long notify_user;
static int rip_msr;
static int mce_bootlog = -1;
static atomic_t mce_events;
static int monarch_timeout = -1;
static int mce_panic_timeout;
static int mce_dont_log_ce;
int mce_cmci_disabled;
int mce_ignore_ce;
int mce_ser;
static char trigger[128];
static char *trigger_argv[2] = { trigger, NULL };
static unsigned long dont_init_banks;
static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
static DEFINE_PER_CPU(struct mce, mces_seen);
static int cpu_missing;
/* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
};
static inline int skip_bank_init(int i)
{
return i < BITS_PER_LONG && test_bit(i, &dont_init_banks);
}
static DEFINE_PER_CPU(struct work_struct, mce_work);
/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
m->cpu = smp_processor_id();
m->cpu = m->extcpu = smp_processor_id();
rdtscll(m->tsc);
/* We hope get_seconds stays lockless */
m->time = get_seconds();
m->cpuvendor = boot_cpu_data.x86_vendor;
m->cpuid = cpuid_eax(1);
#ifdef CONFIG_SMP
m->socketid = cpu_data(m->extcpu).phys_proc_id;
#endif
m->apicid = cpu_data(m->extcpu).initial_apicid;
rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
}
DEFINE_PER_CPU(struct mce, injectm);
EXPORT_PER_CPU_SYMBOL_GPL(injectm);
/*
* Lockless MCE logging infrastructure.
* This avoids deadlocks on printk locks without having to break locks. Also
......@@ -82,26 +138,31 @@ void mce_setup(struct mce *m)
*/
static struct mce_log mcelog = {
MCE_LOG_SIGNATURE,
MCE_LOG_LEN,
.signature = MCE_LOG_SIGNATURE,
.len = MCE_LOG_LEN,
.recordlen = sizeof(struct mce),
};
void mce_log(struct mce *mce)
{
unsigned next, entry;
atomic_inc(&mce_events);
mce->finished = 0;
wmb();
for (;;) {
entry = rcu_dereference(mcelog.next);
for (;;) {
/* When the buffer fills up discard new entries. Assume
that the earlier errors are the more interesting. */
/*
* When the buffer fills up discard new entries.
* Assume that the earlier errors are the more
* interesting ones:
*/
if (entry >= MCE_LOG_LEN) {
set_bit(MCE_OVERFLOW, (unsigned long *)&mcelog.flags);
set_bit(MCE_OVERFLOW,
(unsigned long *)&mcelog.flags);
return;
}
/* Old left over entry. Skip. */
/* Old left over entry. Skip: */
if (mcelog.entry[entry].finished) {
entry++;
continue;
......@@ -118,16 +179,15 @@ void mce_log(struct mce *mce)
mcelog.entry[entry].finished = 1;
wmb();
mce->finished = 1;
set_bit(0, &notify_user);
}
static void print_mce(struct mce *m)
{
printk(KERN_EMERG "\n"
KERN_EMERG "HARDWARE ERROR\n"
KERN_EMERG
printk(KERN_EMERG
"CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
m->cpu, m->mcgstatus, m->bank, m->status);
m->extcpu, m->mcgstatus, m->bank, m->status);
if (m->ip) {
printk(KERN_EMERG "RIP%s %02x:<%016Lx> ",
!(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
......@@ -142,68 +202,298 @@ static void print_mce(struct mce *m)
if (m->misc)
printk("MISC %llx ", m->misc);
printk("\n");
printk(KERN_EMERG "This is not a software problem!\n");
printk(KERN_EMERG "Run through mcelog --ascii to decode "
"and contact your hardware vendor\n");
printk(KERN_EMERG "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
m->cpuvendor, m->cpuid, m->time, m->socketid,
m->apicid);
}
static void mce_panic(char *msg, struct mce *backup, unsigned long start)
static void print_mce_head(void)
{
printk(KERN_EMERG "\n" KERN_EMERG "HARDWARE ERROR\n");
}
static void print_mce_tail(void)
{
printk(KERN_EMERG "This is not a software problem!\n"
KERN_EMERG "Run through mcelog --ascii to decode and contact your hardware vendor\n");
}
#define PANIC_TIMEOUT 5 /* 5 seconds */
static atomic_t mce_paniced;
/* Panic in progress. Enable interrupts and wait for final IPI */
static void wait_for_panic(void)
{
long timeout = PANIC_TIMEOUT*USEC_PER_SEC;
preempt_disable();
local_irq_enable();
while (timeout-- > 0)
udelay(1);
if (panic_timeout == 0)
panic_timeout = mce_panic_timeout;
panic("Panicing machine check CPU died");
}
static void mce_panic(char *msg, struct mce *final, char *exp)
{
int i;
oops_begin();
for (i = 0; i < MCE_LOG_LEN; i++) {
unsigned long tsc = mcelog.entry[i].tsc;
/*
* Make sure only one CPU runs in machine check panic
*/
if (atomic_add_return(1, &mce_paniced) > 1)
wait_for_panic();
barrier();
if (time_before(tsc, start))
bust_spinlocks(1);
console_verbose();
print_mce_head();
/* First print corrected ones that are still unlogged */
for (i = 0; i < MCE_LOG_LEN; i++) {
struct mce *m = &mcelog.entry[i];
if (!(m->status & MCI_STATUS_VAL))
continue;
if (!(m->status & MCI_STATUS_UC))
print_mce(m);
}
/* Now print uncorrected but with the final one last */
for (i = 0; i < MCE_LOG_LEN; i++) {
struct mce *m = &mcelog.entry[i];
if (!(m->status & MCI_STATUS_VAL))
continue;
print_mce(&mcelog.entry[i]);
if (backup && mcelog.entry[i].tsc == backup->tsc)
backup = NULL;
if (!(m->status & MCI_STATUS_UC))
continue;
if (!final || memcmp(m, final, sizeof(struct mce)))
print_mce(m);
}
if (backup)
print_mce(backup);
if (final)
print_mce(final);
if (cpu_missing)
printk(KERN_EMERG "Some CPUs didn't answer in synchronization\n");
print_mce_tail();
if (exp)
printk(KERN_EMERG "Machine check: %s\n", exp);
if (panic_timeout == 0)
panic_timeout = mce_panic_timeout;
panic(msg);
}
/* Support code for software error injection */
static int msr_to_offset(u32 msr)
{
unsigned bank = __get_cpu_var(injectm.bank);
if (msr == rip_msr)
return offsetof(struct mce, ip);
if (msr == MSR_IA32_MC0_STATUS + bank*4)
return offsetof(struct mce, status);
if (msr == MSR_IA32_MC0_ADDR + bank*4)
return offsetof(struct mce, addr);
if (msr == MSR_IA32_MC0_MISC + bank*4)
return offsetof(struct mce, misc);
if (msr == MSR_IA32_MCG_STATUS)
return offsetof(struct mce, mcgstatus);
return -1;
}
/* MSR access wrappers used for error injection */
static u64 mce_rdmsrl(u32 msr)
{
u64 v;
if (__get_cpu_var(injectm).finished) {
int offset = msr_to_offset(msr);
if (offset < 0)
return 0;
return *(u64 *)((char *)&__get_cpu_var(injectm) + offset);
}
rdmsrl(msr, v);
return v;
}
static void mce_wrmsrl(u32 msr, u64 v)
{
if (__get_cpu_var(injectm).finished) {
int offset = msr_to_offset(msr);
if (offset >= 0)
*(u64 *)((char *)&__get_cpu_var(injectm) + offset) = v;
return;
}
wrmsrl(msr, v);
}
/*
* Simple lockless ring to communicate PFNs from the exception handler with the
* process context work function. This is vastly simplified because there's
* only a single reader and a single writer.
*/
#define MCE_RING_SIZE 16 /* we use one entry less */
struct mce_ring {
unsigned short start;
unsigned short end;
unsigned long ring[MCE_RING_SIZE];
};
static DEFINE_PER_CPU(struct mce_ring, mce_ring);
/* Runs with CPU affinity in workqueue */
static int mce_ring_empty(void)
{
struct mce_ring *r = &__get_cpu_var(mce_ring);
return r->start == r->end;
}
static int mce_ring_get(unsigned long *pfn)
{
struct mce_ring *r;
int ret = 0;
*pfn = 0;
get_cpu();
r = &__get_cpu_var(mce_ring);
if (r->start == r->end)
goto out;
*pfn = r->ring[r->start];
r->start = (r->start + 1) % MCE_RING_SIZE;
ret = 1;
out:
put_cpu();
return ret;
}
/* Always runs in MCE context with preempt off */
static int mce_ring_add(unsigned long pfn)
{
struct mce_ring *r = &__get_cpu_var(mce_ring);
unsigned next;
next = (r->end + 1) % MCE_RING_SIZE;
if (next == r->start)
return -1;
r->ring[r->end] = pfn;
wmb();
r->end = next;
return 0;
}
int mce_available(struct cpuinfo_x86 *c)
{
if (mce_dont_init)
if (mce_disabled)
return 0;
return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA);
}
static void mce_schedule_work(void)
{
if (!mce_ring_empty()) {
struct work_struct *work = &__get_cpu_var(mce_work);
if (!work_pending(work))
schedule_work(work);
}
}
/*
* Get the address of the instruction at the time of the machine check
* error.
*/
static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
{
if (regs && (m->mcgstatus & MCG_STATUS_RIPV)) {
if (regs && (m->mcgstatus & (MCG_STATUS_RIPV|MCG_STATUS_EIPV))) {
m->ip = regs->ip;
m->cs = regs->cs;
} else {
m->ip = 0;
m->cs = 0;
}
if (rip_msr) {
/* Assume the RIP in the MSR is exact. Is this true? */
m->mcgstatus |= MCG_STATUS_EIPV;
rdmsrl(rip_msr, m->ip);
m->cs = 0;
if (rip_msr)
m->ip = mce_rdmsrl(rip_msr);
}
#ifdef CONFIG_X86_LOCAL_APIC
/*
* Called after interrupts have been reenabled again
* when a MCE happened during an interrupts off region
* in the kernel.
*/
asmlinkage void smp_mce_self_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
exit_idle();
irq_enter();
mce_notify_irq();
mce_schedule_work();
irq_exit();
}
#endif
static void mce_report_event(struct pt_regs *regs)
{
if (regs->flags & (X86_VM_MASK|X86_EFLAGS_IF)) {
mce_notify_irq();
/*
* Triggering the work queue here is just an insurance
* policy in case the syscall exit notify handler
* doesn't run soon enough or ends up running on the
* wrong CPU (can happen when audit sleeps)
*/
mce_schedule_work();
return;
}
#ifdef CONFIG_X86_LOCAL_APIC
/*
* Without APIC do not notify. The event will be picked
* up eventually.
*/
if (!cpu_has_apic)
return;
/*
* When interrupts are disabled we cannot use
* kernel services safely. Trigger an self interrupt
* through the APIC to instead do the notification
* after interrupts are reenabled again.
*/
apic->send_IPI_self(MCE_SELF_VECTOR);
/*
* Wait for idle afterwards again so that we don't leave the
* APIC in a non idle state because the normal APIC writes
* cannot exclude us.
*/
apic_wait_icr_idle();
#endif
}
DEFINE_PER_CPU(unsigned, mce_poll_count);
/*
* Poll for corrected events or events that happened before reset.
* Those are just logged through /dev/mcelog.
*
* This is executed in standard interrupt context.
*
* Note: spec recommends to panic for fatal unsignalled
* errors here. However this would be quite problematic --
* we would need to reimplement the Monarch handling and
* it would mess up the exclusion between exception handler
* and poll hander -- * so we skip this for now.
* These cases should not happen anyways, or only when the CPU
* is already totally * confused. In this case it's likely it will
* not fully execute the machine check handler either.
*/
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
__get_cpu_var(mce_poll_count)++;
mce_setup(&m);
rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
for (i = 0; i < banks; i++) {
if (!bank[i] || !test_bit(i, *b))
continue;
......@@ -214,24 +504,24 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
m.tsc = 0;
barrier();
rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);
m.status = mce_rdmsrl(MSR_IA32_MC0_STATUS + i*4);
if (!(m.status & MCI_STATUS_VAL))
continue;
/*
* Uncorrected events are handled by the exception handler
* when it is enabled. But when the exception is disabled log
* everything.
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
*
* TBD do the same check for MCI_STATUS_EN here?
*/
if ((m.status & MCI_STATUS_UC) && !(flags & MCP_UC))
if (!(flags & MCP_UC) &&
(m.status & (mce_ser ? MCI_STATUS_S : MCI_STATUS_UC)))
continue;
if (m.status & MCI_STATUS_MISCV)
rdmsrl(MSR_IA32_MC0_MISC + i*4, m.misc);
m.misc = mce_rdmsrl(MSR_IA32_MC0_MISC + i*4);
if (m.status & MCI_STATUS_ADDRV)
rdmsrl(MSR_IA32_MC0_ADDR + i*4, m.addr);
m.addr = mce_rdmsrl(MSR_IA32_MC0_ADDR + i*4);
if (!(flags & MCP_TIMESTAMP))
m.tsc = 0;
......@@ -239,7 +529,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
* Don't get the IP here because it's unlikely to
* have anything to do with the actual error location.
*/
if (!(flags & MCP_DONTLOG)) {
if (!(flags & MCP_DONTLOG) && !mce_dont_log_ce) {
mce_log(&m);
add_taint(TAINT_MACHINE_CHECK);
}
......@@ -247,13 +537,307 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
/*
* Clear state for this bank.
*/
wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
mce_wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
}
/*
* Don't clear MCG_STATUS here because it's only defined for
* exceptions.
*/
sync_core();
}
EXPORT_SYMBOL_GPL(machine_check_poll);
/*
* Do a quick check if any of the events requires a panic.
* This decides if we keep the events around or clear them.
*/
static int mce_no_way_out(struct mce *m, char **msg)
{
int i;
for (i = 0; i < banks; i++) {
m->status = mce_rdmsrl(MSR_IA32_MC0_STATUS + i*4);
if (mce_severity(m, tolerant, msg) >= MCE_PANIC_SEVERITY)
return 1;
}
return 0;
}
/*
* Variable to establish order between CPUs while scanning.
* Each CPU spins initially until executing is equal its number.
*/
static atomic_t mce_executing;
/*
* Defines order of CPUs on entry. First CPU becomes Monarch.
*/
static atomic_t mce_callin;
/*
* Check if a timeout waiting for other CPUs happened.
*/
static int mce_timed_out(u64 *t)
{
/*
* The others already did panic for some reason.
* Bail out like in a timeout.
* rmb() to tell the compiler that system_state
* might have been modified by someone else.
*/
rmb();
if (atomic_read(&mce_paniced))
wait_for_panic();
if (!monarch_timeout)
goto out;
if ((s64)*t < SPINUNIT) {
/* CHECKME: Make panic default for 1 too? */
if (tolerant < 1)
mce_panic("Timeout synchronizing machine check over CPUs",
NULL, NULL);
cpu_missing = 1;
return 1;
}
*t -= SPINUNIT;
out:
touch_nmi_watchdog();
return 0;
}
/*
* The Monarch's reign. The Monarch is the CPU who entered
* the machine check handler first. It waits for the others to
* raise the exception too and then grades them. When any
* error is fatal panic. Only then let the others continue.
*
* The other CPUs entering the MCE handler will be controlled by the
* Monarch. They are called Subjects.
*
* This way we prevent any potential data corruption in a unrecoverable case
* and also makes sure always all CPU's errors are examined.
*
* Also this detects the case of an machine check event coming from outer
* space (not detected by any CPUs) In this case some external agent wants
* us to shut down, so panic too.
*
* The other CPUs might still decide to panic if the handler happens
* in a unrecoverable place, but in this case the system is in a semi-stable
* state and won't corrupt anything by itself. It's ok to let the others
* continue for a bit first.
*
* All the spin loops have timeouts; when a timeout happens a CPU
* typically elects itself to be Monarch.
*/
static void mce_reign(void)
{
int cpu;
struct mce *m = NULL;
int global_worst = 0;
char *msg = NULL;
char *nmsg = NULL;
/*
* This CPU is the Monarch and the other CPUs have run
* through their handlers.
* Grade the severity of the errors of all the CPUs.
*/
for_each_possible_cpu(cpu) {
int severity = mce_severity(&per_cpu(mces_seen, cpu), tolerant,
&nmsg);
if (severity > global_worst) {
msg = nmsg;
global_worst = severity;
m = &per_cpu(mces_seen, cpu);
}
}
/*
* Cannot recover? Panic here then.
* This dumps all the mces in the log buffer and stops the
* other CPUs.
*/
if (m && global_worst >= MCE_PANIC_SEVERITY && tolerant < 3)
mce_panic("Fatal Machine check", m, msg);
/*
* For UC somewhere we let the CPU who detects it handle it.
* Also must let continue the others, otherwise the handling
* CPU could deadlock on a lock.
*/
/*
* No machine check event found. Must be some external
* source or one CPU is hung. Panic.
*/
if (!m && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL);
/*
* Now clear all the mces_seen so that they don't reappear on
* the next mce.
*/
for_each_possible_cpu(cpu)
memset(&per_cpu(mces_seen, cpu), 0, sizeof(struct mce));
}
static atomic_t global_nwo;
/*
* Start of Monarch synchronization. This waits until all CPUs have
* entered the exception handler and then determines if any of them
* saw a fatal event that requires panic. Then it executes them
* in the entry order.
* TBD double check parallel CPU hotunplug
*/
static int mce_start(int no_way_out, int *order)
{
int nwo;
int cpus = num_online_cpus();
u64 timeout = (u64)monarch_timeout * NSEC_PER_USEC;
if (!timeout) {
*order = -1;
return no_way_out;
}
atomic_add(no_way_out, &global_nwo);
/*
* Wait for everyone.
*/
while (atomic_read(&mce_callin) != cpus) {
if (mce_timed_out(&timeout)) {
atomic_set(&global_nwo, 0);
*order = -1;
return no_way_out;
}
ndelay(SPINUNIT);
}
/*
* Cache the global no_way_out state.
*/
nwo = atomic_read(&global_nwo);
/*
* Monarch starts executing now, the others wait.
*/
if (*order == 1) {
atomic_set(&mce_executing, 1);
return nwo;
}
/*
* Now start the scanning loop one by one
* in the original callin order.
* This way when there are any shared banks it will
* be only seen by one CPU before cleared, avoiding duplicates.
*/
while (atomic_read(&mce_executing) < *order) {
if (mce_timed_out(&timeout)) {
atomic_set(&global_nwo, 0);
*order = -1;
return no_way_out;
}
ndelay(SPINUNIT);
}
return nwo;
}
/*
* Synchronize between CPUs after main scanning loop.
* This invokes the bulk of the Monarch processing.
*/
static int mce_end(int order)
{
int ret = -1;
u64 timeout = (u64)monarch_timeout * NSEC_PER_USEC;
if (!timeout)
goto reset;
if (order < 0)
goto reset;
/*
* Allow others to run.
*/
atomic_inc(&mce_executing);
if (order == 1) {
/* CHECKME: Can this race with a parallel hotplug? */
int cpus = num_online_cpus();
/*
* Monarch: Wait for everyone to go through their scanning
* loops.
*/
while (atomic_read(&mce_executing) <= cpus) {
if (mce_timed_out(&timeout))
goto reset;
ndelay(SPINUNIT);
}
mce_reign();
barrier();
ret = 0;
} else {
/*
* Subject: Wait for Monarch to finish.
*/
while (atomic_read(&mce_executing) != 0) {
if (mce_timed_out(&timeout))
goto reset;
ndelay(SPINUNIT);
}
/*
* Don't reset anything. That's done by the Monarch.
*/
return 0;
}
/*
* Reset all global state.
*/
reset:
atomic_set(&global_nwo, 0);
atomic_set(&mce_callin, 0);
barrier();
/*
* Let others run again.
*/
atomic_set(&mce_executing, 0);
return ret;
}
/*
* Check if the address reported by the CPU is in a format we can parse.
* It would be possible to add code for most other cases, but all would
* be somewhat complicated (e.g. segment offset would require an instruction
* parser). So only support physical addresses upto page granuality for now.
*/
static int mce_usable_address(struct mce *m)
{
if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV))
return 0;
if ((m->misc & 0x3f) > PAGE_SHIFT)
return 0;
if (((m->misc >> 6) & 7) != MCM_ADDR_PHYS)
return 0;
return 1;
}
static void mce_clear_state(unsigned long *toclear)
{
int i;
for (i = 0; i < banks; i++) {
if (test_bit(i, toclear))
mce_wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
}
}
/*
......@@ -263,13 +847,23 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
* This is executed in NMI context not subject to normal locking rules. This
* implies that most kernel services cannot be safely used. Don't even
* think about putting a printk in there!
*
* On Intel systems this is entered on all CPUs in parallel through
* MCE broadcast. However some CPUs might be broken beyond repair,
* so be always careful when synchronizing with others.
*/
void do_machine_check(struct pt_regs * regs, long error_code)
void do_machine_check(struct pt_regs *regs, long error_code)
{
struct mce m, panicm;
u64 mcestart = 0;
struct mce m, *final;
int i;
int panicm_found = 0;
int worst = 0;
int severity;
/*
* Establish sequential order between the CPUs entering the machine
* check handler.
*/
int order;
/*
* If no_way_out gets set, there is no safe way to recover from this
* MCE. If tolerant is cranked up, we'll try anyway.
......@@ -281,25 +875,41 @@ void do_machine_check(struct pt_regs * regs, long error_code)
*/
int kill_it = 0;
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
char *msg = "Unknown";
atomic_inc(&mce_entry);
__get_cpu_var(mce_exception_count)++;
if (notify_die(DIE_NMI, "machine check", regs, error_code,
18, SIGKILL) == NOTIFY_STOP)
goto out2;
goto out;
if (!banks)
goto out2;
goto out;
order = atomic_add_return(1, &mce_callin);
mce_setup(&m);
rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
/* if the restart IP is not valid, we're done for */
if (!(m.mcgstatus & MCG_STATUS_RIPV))
no_way_out = 1;
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
no_way_out = mce_no_way_out(&m, &msg);
final = &__get_cpu_var(mces_seen);
*final = m;
rdtscll(mcestart);
barrier();
/*
* When no restart IP must always kill or panic.
*/
if (!(m.mcgstatus & MCG_STATUS_RIPV))
kill_it = 1;
/*
* Go through all the banks in exclusion of the other CPUs.
* This way we don't report duplicated events on shared banks
* because the first one to see it will clear it.
*/
no_way_out = mce_start(no_way_out, &order);
for (i = 0; i < banks; i++) {
__clear_bit(i, toclear);
if (!bank[i])
......@@ -309,15 +919,16 @@ void do_machine_check(struct pt_regs * regs, long error_code)
m.addr = 0;
m.bank = i;
rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);
m.status = mce_rdmsrl(MSR_IA32_MC0_STATUS + i*4);
if ((m.status & MCI_STATUS_VAL) == 0)
continue;
/*
* Non uncorrected errors are handled by machine_check_poll
* Leave them alone.
* Non uncorrected or non signaled errors are handled by
* machine_check_poll. Leave them alone, unless this panics.
*/
if ((m.status & MCI_STATUS_UC) == 0)
if (!(m.status & (mce_ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
!no_way_out)
continue;
/*
......@@ -325,22 +936,16 @@ void do_machine_check(struct pt_regs * regs, long error_code)
*/
add_taint(TAINT_MACHINE_CHECK);
__set_bit(i, toclear);
severity = mce_severity(&m, tolerant, NULL);
if (m.status & MCI_STATUS_EN) {
/* if PCC was set, there's no way out */
no_way_out |= !!(m.status & MCI_STATUS_PCC);
/*
* If this error was uncorrectable and there was
* an overflow, we're in trouble. If no overflow,
* we might get away with just killing a task.
* When machine check was for corrected handler don't touch,
* unless we're panicing.
*/
if (m.status & MCI_STATUS_UC) {
if (tolerant < 1 || m.status & MCI_STATUS_OVER)
no_way_out = 1;
kill_it = 1;
}
} else {
if (severity == MCE_KEEP_SEVERITY && !no_way_out)
continue;
__set_bit(i, toclear);
if (severity == MCE_NO_SEVERITY) {
/*
* Machine check event was not enabled. Clear, but
* ignore.
......@@ -348,34 +953,55 @@ void do_machine_check(struct pt_regs * regs, long error_code)
continue;
}
/*
* Kill on action required.
*/
if (severity == MCE_AR_SEVERITY)
kill_it = 1;
if (m.status & MCI_STATUS_MISCV)
rdmsrl(MSR_IA32_MC0_MISC + i*4, m.misc);
m.misc = mce_rdmsrl(MSR_IA32_MC0_MISC + i*4);
if (m.status & MCI_STATUS_ADDRV)
rdmsrl(MSR_IA32_MC0_ADDR + i*4, m.addr);
m.addr = mce_rdmsrl(MSR_IA32_MC0_ADDR + i*4);
/*
* Action optional error. Queue address for later processing.
* When the ring overflows we just ignore the AO error.
* RED-PEN add some logging mechanism when
* usable_address or mce_add_ring fails.
* RED-PEN don't ignore overflow for tolerant == 0
*/
if (severity == MCE_AO_SEVERITY && mce_usable_address(&m))
mce_ring_add(m.addr >> PAGE_SHIFT);
mce_get_rip(&m, regs);
mce_log(&m);
/* Did this bank cause the exception? */
/* Assume that the bank with uncorrectable errors did it,
and that there is only a single one. */
if ((m.status & MCI_STATUS_UC) && (m.status & MCI_STATUS_EN)) {
panicm = m;
panicm_found = 1;
if (severity > worst) {
*final = m;
worst = severity;
}
}
/* If we didn't find an uncorrectable error, pick
the last one (shouldn't happen, just being safe). */
if (!panicm_found)
panicm = m;
if (!no_way_out)
mce_clear_state(toclear);
/*
* Do most of the synchronization with other CPUs.
* When there's any problem use only local no_way_out state.
*/
if (mce_end(order) < 0)
no_way_out = worst >= MCE_PANIC_SEVERITY;
/*
* If we have decided that we just CAN'T continue, and the user
* has not set tolerant to an insane level, give up and die.
*
* This is mainly used in the case when the system doesn't
* support MCE broadcasting or it has been disabled.
*/
if (no_way_out && tolerant < 3)
mce_panic("Machine check", &panicm, mcestart);
mce_panic("Fatal machine check on current CPU", final, msg);
/*
* If the error seems to be unrecoverable, something should be
......@@ -383,45 +1009,52 @@ void do_machine_check(struct pt_regs * regs, long error_code)
* one task, do that. If the user has set the tolerance very
* high, don't try to do anything at all.
*/
if (kill_it && tolerant < 3) {
int user_space = 0;
/*
* If the EIPV bit is set, it means the saved IP is the
* instruction which caused the MCE.
*/
if (m.mcgstatus & MCG_STATUS_EIPV)
user_space = panicm.ip && (panicm.cs & 3);
/*
* If we know that the error was in user space, send a
* SIGBUS. Otherwise, panic if tolerance is low.
*
* force_sig() takes an awful lot of locks and has a slight
* risk of deadlocking.
*/
if (user_space) {
if (kill_it && tolerant < 3)
force_sig(SIGBUS, current);
} else if (panic_on_oops || tolerant < 2) {
mce_panic("Uncorrected machine check",
&panicm, mcestart);
}
}
/* notify userspace ASAP */
set_thread_flag(TIF_MCE_NOTIFY);
/* the last thing we do is clear state */
for (i = 0; i < banks; i++) {
if (test_bit(i, toclear))
wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
}
wrmsrl(MSR_IA32_MCG_STATUS, 0);
out2:
if (worst > 0)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
atomic_dec(&mce_entry);
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
/* dummy to break dependency. actual code is in mm/memory-failure.c */
void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
{
printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
}
/*
* Called after mce notification in process context. This code
* is allowed to sleep. Call the high level VM handler to process
* any corrupted pages.
* Assume that the work queue code only calls this one at a time
* per CPU.
* Note we don't disable preemption, so this code might run on the wrong
* CPU. In this case the event is picked up by the scheduled work queue.
* This is merely a fast path to expedite processing in some common
* cases.
*/
void mce_notify_process(void)
{
unsigned long pfn;
mce_notify_irq();
while (mce_ring_get(&pfn))
memory_failure(pfn, MCE_VECTOR);
}
static void mce_process_work(struct work_struct *dummy)
{
mce_notify_process();
}
#ifdef CONFIG_X86_MCE_INTEL
/***
* mce_log_therm_throt_event - Logs the thermal throttling event to mcelog
......@@ -452,10 +1085,9 @@ void mce_log_therm_throt_event(__u64 status)
* poller finds an MCE, poll 2x faster. When the poller finds no more
* errors, poll 2x slower (up to check_interval seconds).
*/
static int check_interval = 5 * 60; /* 5 minutes */
static DEFINE_PER_CPU(int, next_interval); /* in jiffies */
static void mcheck_timer(unsigned long);
static DEFINE_PER_CPU(struct timer_list, mce_timer);
static void mcheck_timer(unsigned long data)
......@@ -465,20 +1097,20 @@ static void mcheck_timer(unsigned long data)
WARN_ON(smp_processor_id() != data);
if (mce_available(&current_cpu_data))
if (mce_available(&current_cpu_data)) {
machine_check_poll(MCP_TIMESTAMP,
&__get_cpu_var(mce_poll_banks));
}
/*
* Alert userspace if needed. If we logged an MCE, reduce the
* polling interval, otherwise increase the polling interval.
*/
n = &__get_cpu_var(next_interval);
if (mce_notify_user()) {
if (mce_notify_irq())
*n = max(*n/2, HZ/100);
} else {
else
*n = min(*n*2, (int)round_jiffies_relative(check_interval*HZ));
}
t->expires = jiffies + *n;
add_timer(t);
......@@ -496,12 +1128,13 @@ static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
* Can be called from interrupt context, but not from machine check/NMI
* context.
*/
int mce_notify_user(void)
int mce_notify_irq(void)
{
/* Not more than two messages every minute */
static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
clear_thread_flag(TIF_MCE_NOTIFY);
if (test_and_clear_bit(0, &notify_user)) {
wake_up_interruptible(&mce_wait);
......@@ -520,39 +1153,21 @@ int mce_notify_user(void)
}
return 0;
}
/* see if the idle task needs to notify userspace */
static int
mce_idle_callback(struct notifier_block *nfb, unsigned long action, void *junk)
{
/* IDLE_END should be safe - interrupts are back on */
if (action == IDLE_END && test_thread_flag(TIF_MCE_NOTIFY))
mce_notify_user();
return NOTIFY_OK;
}
static struct notifier_block mce_idle_notifier = {
.notifier_call = mce_idle_callback,
};
static __init int periodic_mcheck_init(void)
{
idle_notifier_register(&mce_idle_notifier);
return 0;
}
__initcall(periodic_mcheck_init);
EXPORT_SYMBOL_GPL(mce_notify_irq);
/*
* Initialize Machine Checks for a CPU.
*/
static int mce_cap_init(void)
{
u64 cap;
unsigned b;
u64 cap;
rdmsrl(MSR_IA32_MCG_CAP, cap);
b = cap & 0xff;
b = cap & MCG_BANKCNT_MASK;
printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
if (b > MAX_NR_BANKS) {
printk(KERN_WARNING
"MCE: Using only %u machine check banks out of %u\n",
......@@ -571,17 +1186,20 @@ static int mce_cap_init(void)
}
/* Use accurate RIP reporting if available. */
if ((cap & (1<<9)) && ((cap >> 16) & 0xff) >= 9)
if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
rip_msr = MSR_IA32_MCG_EIP;
if (cap & MCG_SER_P)
mce_ser = 1;
return 0;
}
static void mce_init(void *dummy)
static void mce_init(void)
{
mce_banks_t all_banks;
u64 cap;
int i;
mce_banks_t all_banks;
/*
* Log the machine checks left over from the previous reset.
......@@ -596,6 +1214,8 @@ static void mce_init(void *dummy)
wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
for (i = 0; i < banks; i++) {
if (skip_bank_init(i))
continue;
wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]);
wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
}
......@@ -606,16 +1226,69 @@ static void mce_cpu_quirks(struct cpuinfo_x86 *c)
{
/* This should be disabled by the BIOS, but isn't always */
if (c->x86_vendor == X86_VENDOR_AMD) {
if (c->x86 == 15 && banks > 4)
/* disable GART TBL walk error reporting, which trips off
incorrectly with the IOMMU & 3ware & Cerberus. */
if (c->x86 == 15 && banks > 4) {
/*
* disable GART TBL walk error reporting, which
* trips off incorrectly with the IOMMU & 3ware
* & Cerberus:
*/
clear_bit(10, (unsigned long *)&bank[4]);
if(c->x86 <= 17 && mce_bootlog < 0)
/* Lots of broken BIOS around that don't clear them
by default and leave crap in there. Don't log. */
}
if (c->x86 <= 17 && mce_bootlog < 0) {
/*
* Lots of broken BIOS around that don't clear them
* by default and leave crap in there. Don't log:
*/
mce_bootlog = 0;
}
/*
* Various K7s with broken bank 0 around. Always disable
* by default.
*/
if (c->x86 == 6)
bank[0] = 0;
}
if (c->x86_vendor == X86_VENDOR_INTEL) {
/*
* SDM documents that on family 6 bank 0 should not be written
* because it aliases to another special BIOS controlled
* register.
* But it's not aliased anymore on model 0x1a+
* Don't ignore bank 0 completely because there could be a
* valid event later, merely don't write CTL0.
*/
if (c->x86 == 6 && c->x86_model < 0x1A)
__set_bit(0, &dont_init_banks);
/*
* All newer Intel systems support MCE broadcasting. Enable
* synchronization with a one second timeout.
*/
if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
monarch_timeout < 0)
monarch_timeout = USEC_PER_SEC;
}
if (monarch_timeout < 0)
monarch_timeout = 0;
if (mce_bootlog != 0)
mce_panic_timeout = 30;
}
static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c)
{
if (c->x86 != 5)
return;
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
if (mce_p5_enabled())
intel_p5_mcheck_init(c);
break;
case X86_VENDOR_CENTAUR:
winchip_mcheck_init(c);
break;
}
}
static void mce_cpu_features(struct cpuinfo_x86 *c)
......@@ -637,6 +1310,9 @@ static void mce_init_timer(void)
struct timer_list *t = &__get_cpu_var(mce_timer);
int *n = &__get_cpu_var(next_interval);
if (mce_ignore_ce)
return;
*n = check_interval * HZ;
if (!*n)
return;
......@@ -647,22 +1323,30 @@ static void mce_init_timer(void)
/*
* Called for each booted CPU to set up machine checks.
* Must be called with preempt off.
* Must be called with preempt off:
*/
void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
if (mce_disabled)
return;
mce_ancient_init(c);
if (!mce_available(c))
return;
if (mce_cap_init() < 0) {
mce_dont_init = 1;
mce_disabled = 1;
return;
}
mce_cpu_quirks(c);
mce_init(NULL);
machine_check_vector = do_machine_check;
mce_init();
mce_cpu_features(c);
mce_init_timer();
INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
}
/*
......@@ -675,12 +1359,11 @@ static int open_exclu; /* already open exclusive? */
static int mce_open(struct inode *inode, struct file *file)
{
lock_kernel();
spin_lock(&mce_state_lock);
if (open_exclu || (open_count && (file->f_flags & O_EXCL))) {
spin_unlock(&mce_state_lock);
unlock_kernel();
return -EBUSY;
}
......@@ -689,7 +1372,6 @@ static int mce_open(struct inode *inode, struct file *file)
open_count++;
spin_unlock(&mce_state_lock);
unlock_kernel();
return nonseekable_open(inode, file);
}
......@@ -713,13 +1395,14 @@ static void collect_tscs(void *data)
rdtscll(cpu_tsc[smp_processor_id()]);
}
static DEFINE_MUTEX(mce_read_mutex);
static ssize_t mce_read(struct file *filp, char __user *ubuf, size_t usize,
loff_t *off)
{
char __user *buf = ubuf;
unsigned long *cpu_tsc;
static DEFINE_MUTEX(mce_read_mutex);
unsigned prev, next;
char __user *buf = ubuf;
int i, err;
cpu_tsc = kmalloc(nr_cpu_ids * sizeof(long), GFP_KERNEL);
......@@ -733,6 +1416,7 @@ static ssize_t mce_read(struct file *filp, char __user *ubuf, size_t usize,
if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce)) {
mutex_unlock(&mce_read_mutex);
kfree(cpu_tsc);
return -EINVAL;
}
......@@ -771,6 +1455,7 @@ static ssize_t mce_read(struct file *filp, char __user *ubuf, size_t usize,
* synchronize.
*/
on_each_cpu(collect_tscs, cpu_tsc, 1);
for (i = next; i < MCE_LOG_LEN; i++) {
if (mcelog.entry[i].finished &&
mcelog.entry[i].tsc < cpu_tsc[mcelog.entry[i].cpu]) {
......@@ -783,6 +1468,7 @@ static ssize_t mce_read(struct file *filp, char __user *ubuf, size_t usize,
}
mutex_unlock(&mce_read_mutex);
kfree(cpu_tsc);
return err ? -EFAULT : buf - ubuf;
}
......@@ -800,6 +1486,7 @@ static long mce_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
switch (cmd) {
case MCE_GET_RECORD_LEN:
return put_user(sizeof(struct mce), p);
......@@ -811,6 +1498,7 @@ static long mce_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
do {
flags = mcelog.flags;
} while (cmpxchg(&mcelog.flags, flags, 0) != flags);
return put_user(flags, p);
}
default:
......@@ -818,13 +1506,15 @@ static long mce_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
}
}
static const struct file_operations mce_chrdev_ops = {
/* Modified in mce-inject.c, so not static or const */
struct file_operations mce_chrdev_ops = {
.open = mce_open,
.release = mce_release,
.read = mce_read,
.poll = mce_poll,
.unlocked_ioctl = mce_ioctl,
};
EXPORT_SYMBOL_GPL(mce_chrdev_ops);
static struct miscdevice mce_log_device = {
MISC_MCELOG_MINOR,
......@@ -833,33 +1523,46 @@ static struct miscdevice mce_log_device = {
};
/*
* Old style boot options parsing. Only for compatibility.
* mce=off Disables machine check
* mce=no_cmci Disables CMCI
* mce=dont_log_ce Clears corrected events silently, no log created for CEs.
* mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
* mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
* monarchtimeout is how long to wait for other CPUs on machine
* check, or 0 to not wait
* mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
* mce=nobootlog Don't log MCEs from before booting.
*/
static int __init mcheck_disable(char *str)
{
mce_dont_init = 1;
return 1;
}
/* mce=off disables machine check.
mce=TOLERANCELEVEL (number, see above)
mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
mce=nobootlog Don't log MCEs from before booting. */
static int __init mcheck_enable(char *str)
{
if (*str == 0)
enable_p5_mce();
if (*str == '=')
str++;
if (!strcmp(str, "off"))
mce_dont_init = 1;
else if (!strcmp(str, "bootlog") || !strcmp(str,"nobootlog"))
mce_bootlog = str[0] == 'b';
else if (isdigit(str[0]))
mce_disabled = 1;
else if (!strcmp(str, "no_cmci"))
mce_cmci_disabled = 1;
else if (!strcmp(str, "dont_log_ce"))
mce_dont_log_ce = 1;
else if (!strcmp(str, "ignore_ce"))
mce_ignore_ce = 1;
else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
mce_bootlog = (str[0] == 'b');
else if (isdigit(str[0])) {
get_option(&str, &tolerant);
else
printk("mce= argument %s ignored. Please use /sys", str);
if (*str == ',') {
++str;
get_option(&str, &monarch_timeout);
}
} else {
printk(KERN_INFO "mce argument %s ignored. Please use /sys\n",
str);
return 0;
}
return 1;
}
__setup("nomce", mcheck_disable);
__setup("mce=", mcheck_enable);
__setup("mce", mcheck_enable);
/*
* Sysfs support
......@@ -873,8 +1576,10 @@ static int mce_disable(void)
{
int i;
for (i = 0; i < banks; i++)
for (i = 0; i < banks; i++) {
if (!skip_bank_init(i))
wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
}
return 0;
}
......@@ -888,13 +1593,16 @@ static int mce_shutdown(struct sys_device *dev)
return mce_disable();
}
/* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
Only one CPU is active at this time, the others get readded later using
CPU hotplug. */
/*
* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
* Only one CPU is active at this time, the others get re-added later using
* CPU hotplug:
*/
static int mce_resume(struct sys_device *dev)
{
mce_init(NULL);
mce_init();
mce_cpu_features(&current_cpu_data);
return 0;
}
......@@ -902,7 +1610,7 @@ static void mce_cpu_restart(void *data)
{
del_timer_sync(&__get_cpu_var(mce_timer));
if (mce_available(&current_cpu_data))
mce_init(NULL);
mce_init();
mce_init_timer();
}
......@@ -919,27 +1627,10 @@ static struct sysdev_class mce_sysclass = {
.name = "machinecheck",
};
DEFINE_PER_CPU(struct sys_device, device_mce);
void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu) __cpuinitdata;
/* Why are there no generic functions for this? */
#define ACCESSOR(name, var, start) \
static ssize_t show_ ## name(struct sys_device *s, \
struct sysdev_attribute *attr, \
char *buf) { \
return sprintf(buf, "%lx\n", (unsigned long)var); \
} \
static ssize_t set_ ## name(struct sys_device *s, \
struct sysdev_attribute *attr, \
const char *buf, size_t siz) { \
char *end; \
unsigned long new = simple_strtoul(buf, &end, 0); \
if (end == buf) return -EINVAL; \
var = new; \
start; \
return end-buf; \
} \
static SYSDEV_ATTR(name, 0644, show_ ## name, set_ ## name);
DEFINE_PER_CPU(struct sys_device, mce_dev);
__cpuinitdata
void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
static struct sysdev_attribute *bank_attrs;
......@@ -947,23 +1638,26 @@ static ssize_t show_bank(struct sys_device *s, struct sysdev_attribute *attr,
char *buf)
{
u64 b = bank[attr - bank_attrs];
return sprintf(buf, "%llx\n", b);
}
static ssize_t set_bank(struct sys_device *s, struct sysdev_attribute *attr,
const char *buf, size_t siz)
const char *buf, size_t size)
{
char *end;
u64 new = simple_strtoull(buf, &end, 0);
if (end == buf)
u64 new;
if (strict_strtoull(buf, 0, &new) < 0)
return -EINVAL;
bank[attr - bank_attrs] = new;
mce_restart();
return end-buf;
return size;
}
static ssize_t show_trigger(struct sys_device *s, struct sysdev_attribute *attr,
char *buf)
static ssize_t
show_trigger(struct sys_device *s, struct sysdev_attribute *attr, char *buf)
{
strcpy(buf, trigger);
strcat(buf, "\n");
......@@ -971,29 +1665,50 @@ static ssize_t show_trigger(struct sys_device *s, struct sysdev_attribute *attr,
}
static ssize_t set_trigger(struct sys_device *s, struct sysdev_attribute *attr,
const char *buf,size_t siz)
const char *buf, size_t siz)
{
char *p;
int len;
strncpy(trigger, buf, sizeof(trigger));
trigger[sizeof(trigger)-1] = 0;
len = strlen(trigger);
p = strchr(trigger, '\n');
if (*p) *p = 0;
if (*p)
*p = 0;
return len;
}
static ssize_t store_int_with_restart(struct sys_device *s,
struct sysdev_attribute *attr,
const char *buf, size_t size)
{
ssize_t ret = sysdev_store_int(s, attr, buf, size);
mce_restart();
return ret;
}
static SYSDEV_ATTR(trigger, 0644, show_trigger, set_trigger);
static SYSDEV_INT_ATTR(tolerant, 0644, tolerant);
ACCESSOR(check_interval,check_interval,mce_restart())
static struct sysdev_attribute *mce_attributes[] = {
&attr_tolerant.attr, &attr_check_interval, &attr_trigger,
static SYSDEV_INT_ATTR(monarch_timeout, 0644, monarch_timeout);
static struct sysdev_ext_attribute attr_check_interval = {
_SYSDEV_ATTR(check_interval, 0644, sysdev_show_int,
store_int_with_restart),
&check_interval
};
static struct sysdev_attribute *mce_attrs[] = {
&attr_tolerant.attr, &attr_check_interval.attr, &attr_trigger,
&attr_monarch_timeout.attr,
NULL
};
static cpumask_var_t mce_device_initialized;
static cpumask_var_t mce_dev_initialized;
/* Per cpu sysdev init. All of the cpus still share the same ctl bank */
/* Per cpu sysdev init. All of the cpus still share the same ctrl bank: */
static __cpuinit int mce_create_device(unsigned int cpu)
{
int err;
......@@ -1002,40 +1717,36 @@ static __cpuinit int mce_create_device(unsigned int cpu)
if (!mce_available(&boot_cpu_data))
return -EIO;
memset(&per_cpu(device_mce, cpu).kobj, 0, sizeof(struct kobject));
per_cpu(device_mce,cpu).id = cpu;
per_cpu(device_mce,cpu).cls = &mce_sysclass;
memset(&per_cpu(mce_dev, cpu).kobj, 0, sizeof(struct kobject));
per_cpu(mce_dev, cpu).id = cpu;
per_cpu(mce_dev, cpu).cls = &mce_sysclass;
err = sysdev_register(&per_cpu(device_mce,cpu));
err = sysdev_register(&per_cpu(mce_dev, cpu));
if (err)
return err;
for (i = 0; mce_attributes[i]; i++) {
err = sysdev_create_file(&per_cpu(device_mce,cpu),
mce_attributes[i]);
for (i = 0; mce_attrs[i]; i++) {
err = sysdev_create_file(&per_cpu(mce_dev, cpu), mce_attrs[i]);
if (err)
goto error;
}
for (i = 0; i < banks; i++) {
err = sysdev_create_file(&per_cpu(device_mce, cpu),
err = sysdev_create_file(&per_cpu(mce_dev, cpu),
&bank_attrs[i]);
if (err)
goto error2;
}
cpumask_set_cpu(cpu, mce_device_initialized);
cpumask_set_cpu(cpu, mce_dev_initialized);
return 0;
error2:
while (--i >= 0) {
sysdev_remove_file(&per_cpu(device_mce, cpu),
&bank_attrs[i]);
}
while (--i >= 0)
sysdev_remove_file(&per_cpu(mce_dev, cpu), &bank_attrs[i]);
error:
while (--i >= 0) {
sysdev_remove_file(&per_cpu(device_mce,cpu),
mce_attributes[i]);
}
sysdev_unregister(&per_cpu(device_mce,cpu));
while (--i >= 0)
sysdev_remove_file(&per_cpu(mce_dev, cpu), mce_attrs[i]);
sysdev_unregister(&per_cpu(mce_dev, cpu));
return err;
}
......@@ -1044,49 +1755,54 @@ static __cpuinit void mce_remove_device(unsigned int cpu)
{
int i;
if (!cpumask_test_cpu(cpu, mce_device_initialized))
if (!cpumask_test_cpu(cpu, mce_dev_initialized))
return;
for (i = 0; mce_attributes[i]; i++)
sysdev_remove_file(&per_cpu(device_mce,cpu),
mce_attributes[i]);
for (i = 0; mce_attrs[i]; i++)
sysdev_remove_file(&per_cpu(mce_dev, cpu), mce_attrs[i]);
for (i = 0; i < banks; i++)
sysdev_remove_file(&per_cpu(device_mce, cpu),
&bank_attrs[i]);
sysdev_unregister(&per_cpu(device_mce,cpu));
cpumask_clear_cpu(cpu, mce_device_initialized);
sysdev_remove_file(&per_cpu(mce_dev, cpu), &bank_attrs[i]);
sysdev_unregister(&per_cpu(mce_dev, cpu));
cpumask_clear_cpu(cpu, mce_dev_initialized);
}
/* Make sure there are no machine checks on offlined CPUs. */
static void mce_disable_cpu(void *h)
{
int i;
unsigned long action = *(unsigned long *)h;
int i;
if (!mce_available(&current_cpu_data))
return;
if (!(action & CPU_TASKS_FROZEN))
cmci_clear();
for (i = 0; i < banks; i++)
for (i = 0; i < banks; i++) {
if (!skip_bank_init(i))
wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
}
}
static void mce_reenable_cpu(void *h)
{
int i;
unsigned long action = *(unsigned long *)h;
int i;
if (!mce_available(&current_cpu_data))
return;
if (!(action & CPU_TASKS_FROZEN))
cmci_reenable();
for (i = 0; i < banks; i++)
for (i = 0; i < banks; i++) {
if (!skip_bank_init(i))
wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
}
}
/* Get notified when a cpu comes on/off. Be hotplug friendly. */
static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
static int __cpuinit
mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;
struct timer_list *t = &per_cpu(mce_timer, cpu);
......@@ -1139,9 +1855,11 @@ static __init int mce_init_banks(void)
for (i = 0; i < banks; i++) {
struct sysdev_attribute *a = &bank_attrs[i];
a->attr.name = kasprintf(GFP_KERNEL, "bank%d", i);
if (!a->attr.name)
goto nomem;
a->attr.mode = 0644;
a->show = show_bank;
a->store = set_bank;
......@@ -1153,6 +1871,7 @@ static __init int mce_init_banks(void)
kfree(bank_attrs[i].attr.name);
kfree(bank_attrs);
bank_attrs = NULL;
return -ENOMEM;
}
......@@ -1164,7 +1883,7 @@ static __init int mce_init_device(void)
if (!mce_available(&boot_cpu_data))
return -EIO;
zalloc_cpumask_var(&mce_device_initialized, GFP_KERNEL);
alloc_cpumask_var(&mce_dev_initialized, GFP_KERNEL);
err = mce_init_banks();
if (err)
......@@ -1182,7 +1901,64 @@ static __init int mce_init_device(void)
register_hotcpu_notifier(&mce_cpu_notifier);
misc_register(&mce_log_device);
return err;
}
device_initcall(mce_init_device);
#else /* CONFIG_X86_OLD_MCE: */
int nr_mce_banks;
EXPORT_SYMBOL_GPL(nr_mce_banks); /* non-fatal.o */
/* This has to be run for each processor */
void mcheck_init(struct cpuinfo_x86 *c)
{
if (mce_disabled == 1)
return;
switch (c->x86_vendor) {
case X86_VENDOR_AMD:
amd_mcheck_init(c);
break;
case X86_VENDOR_INTEL:
if (c->x86 == 5)
intel_p5_mcheck_init(c);
if (c->x86 == 6)
intel_p6_mcheck_init(c);
if (c->x86 == 15)
intel_p4_mcheck_init(c);
break;
case X86_VENDOR_CENTAUR:
if (c->x86 == 5)
winchip_mcheck_init(c);
break;
default:
break;
}
printk(KERN_INFO "mce: CPU supports %d MCE banks\n", nr_mce_banks);
}
static int __init mcheck_enable(char *str)
{
mce_disabled = -1;
return 1;
}
__setup("mce", mcheck_enable);
#endif /* CONFIG_X86_OLD_MCE */
/*
* Old style boot options parsing. Only for compatibility.
*/
static int __init mcheck_disable(char *str)
{
mce_disabled = 1;
return 1;
}
__setup("nomce", mcheck_disable);
#include <linux/init.h>
#include <asm/mce.h>
#ifdef CONFIG_X86_OLD_MCE
void amd_mcheck_init(struct cpuinfo_x86 *c);
void intel_p4_mcheck_init(struct cpuinfo_x86 *c);
void intel_p5_mcheck_init(struct cpuinfo_x86 *c);
void intel_p6_mcheck_init(struct cpuinfo_x86 *c);
#endif
#ifdef CONFIG_X86_ANCIENT_MCE
void intel_p5_mcheck_init(struct cpuinfo_x86 *c);
void winchip_mcheck_init(struct cpuinfo_x86 *c);
extern int mce_p5_enable;
static inline int mce_p5_enabled(void) { return mce_p5_enable; }
static inline void enable_p5_mce(void) { mce_p5_enable = 1; }
#else
static inline void intel_p5_mcheck_init(struct cpuinfo_x86 *c) {}
static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
static inline int mce_p5_enabled(void) { return 0; }
static inline void enable_p5_mce(void) { }
#endif
/* Call the installed machine check handler for this CPU setup. */
extern void (*machine_check_vector)(struct pt_regs *, long error_code);
#ifdef CONFIG_X86_OLD_MCE
extern int nr_mce_banks;
void intel_set_thermal_handler(void);
#else
static inline void intel_set_thermal_handler(void) { }
#endif
void intel_init_thermal(struct cpuinfo_x86 *c);
/*
* mce.c - x86 Machine Check Exception Reporting
* (c) 2002 Alan Cox <alan@lxorguk.ukuu.org.uk>, Dave Jones <davej@redhat.com>
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/smp.h>
#include <linux/thread_info.h>
#include <asm/processor.h>
#include <asm/system.h>
#include <asm/mce.h>
#include "mce.h"
int mce_disabled;
int nr_mce_banks;
EXPORT_SYMBOL_GPL(nr_mce_banks); /* non-fatal.o */
/* Handle unconfigured int18 (should never happen) */
static void unexpected_machine_check(struct pt_regs *regs, long error_code)
{
printk(KERN_ERR "CPU#%d: Unexpected int18 (Machine Check).\n", smp_processor_id());
}
/* Call the installed machine check handler for this CPU setup. */
void (*machine_check_vector)(struct pt_regs *, long error_code) = unexpected_machine_check;
/* This has to be run for each processor */
void mcheck_init(struct cpuinfo_x86 *c)
{
if (mce_disabled == 1)
return;
switch (c->x86_vendor) {
case X86_VENDOR_AMD:
amd_mcheck_init(c);
break;
case X86_VENDOR_INTEL:
if (c->x86 == 5)
intel_p5_mcheck_init(c);
if (c->x86 == 6)
intel_p6_mcheck_init(c);
if (c->x86 == 15)
intel_p4_mcheck_init(c);
break;
case X86_VENDOR_CENTAUR:
if (c->x86 == 5)
winchip_mcheck_init(c);
break;
default:
break;
}
}
static int __init mcheck_disable(char *str)
{
mce_disabled = 1;
return 1;
}
static int __init mcheck_enable(char *str)
{
mce_disabled = -1;
return 1;
}
__setup("nomce", mcheck_disable);
__setup("mce", mcheck_enable);
......@@ -13,22 +13,22 @@
*
* All MC4_MISCi registers are shared between multi-cores
*/
#include <linux/cpu.h>
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/kobject.h>
#include <linux/notifier.h>
#include <linux/sched.h>
#include <linux/smp.h>
#include <linux/kobject.h>
#include <linux/percpu.h>
#include <linux/sysdev.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/sysfs.h>
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/smp.h>
#include <asm/apic.h>
#include <asm/idle.h>
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/percpu.h>
#include <asm/idle.h>
#define PFX "mce_threshold: "
#define VERSION "version 1.1.1"
......@@ -110,6 +110,7 @@ static void threshold_restart_bank(void *_tr)
} else if (tr->old_limit) { /* change limit w/o reset */
int new_count = (mci_misc_hi & THRESHOLD_MAX) +
(tr->old_limit - tr->b->threshold_limit);
mci_misc_hi = (mci_misc_hi & ~MASK_ERR_COUNT_HI) |
(new_count & THRESHOLD_MAX);
}
......@@ -125,11 +126,11 @@ static void threshold_restart_bank(void *_tr)
/* cpu init entry point, called from mce.c with preempt off */
void mce_amd_feature_init(struct cpuinfo_x86 *c)
{
unsigned int bank, block;
unsigned int cpu = smp_processor_id();
u8 lvt_off;
u32 low = 0, high = 0, address = 0;
unsigned int bank, block;
struct thresh_restart tr;
u8 lvt_off;
for (bank = 0; bank < NR_BANKS; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
......@@ -140,8 +141,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
if (!address)
break;
address += MCG_XBLK_ADDR;
}
else
} else
++address;
if (rdmsr_safe(address, &low, &high))
......@@ -193,9 +193,9 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
*/
static void amd_threshold_interrupt(void)
{
u32 low = 0, high = 0, address = 0;
unsigned int bank, block;
struct mce m;
u32 low = 0, high = 0, address = 0;
mce_setup(&m);
......@@ -204,16 +204,16 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
continue;
for (block = 0; block < NR_BLOCKS; ++block) {
if (block == 0)
if (block == 0) {
address = MSR_IA32_MC0_MISC + bank * 4;
else if (block == 1) {
} else if (block == 1) {
address = (low & MASK_BLKPTR_LO) >> 21;
if (!address)
break;
address += MCG_XBLK_ADDR;
}
else
} else {
++address;
}
if (rdmsr_safe(address, &low, &high))
break;
......@@ -229,8 +229,10 @@ static void amd_threshold_interrupt(void)
(high & MASK_LOCKED_HI))
continue;
/* Log the machine check that caused the threshold
event. */
/*
* Log the machine check that caused the threshold
* event.
*/
machine_check_poll(MCP_TIMESTAMP,
&__get_cpu_var(mce_poll_banks));
......@@ -254,48 +256,52 @@ static void amd_threshold_interrupt(void)
struct threshold_attr {
struct attribute attr;
ssize_t(*show) (struct threshold_block *, char *);
ssize_t(*store) (struct threshold_block *, const char *, size_t count);
ssize_t (*show) (struct threshold_block *, char *);
ssize_t (*store) (struct threshold_block *, const char *, size_t count);
};
#define SHOW_FIELDS(name) \
static ssize_t show_ ## name(struct threshold_block * b, char *buf) \
static ssize_t show_ ## name(struct threshold_block *b, char *buf) \
{ \
return sprintf(buf, "%lx\n", (unsigned long) b->name); \
}
SHOW_FIELDS(interrupt_enable)
SHOW_FIELDS(threshold_limit)
static ssize_t store_interrupt_enable(struct threshold_block *b,
const char *buf, size_t count)
static ssize_t
store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size)
{
char *end;
struct thresh_restart tr;
unsigned long new = simple_strtoul(buf, &end, 0);
if (end == buf)
unsigned long new;
if (strict_strtoul(buf, 0, &new) < 0)
return -EINVAL;
b->interrupt_enable = !!new;
tr.b = b;
tr.reset = 0;
tr.old_limit = 0;
smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1);
return end - buf;
return size;
}
static ssize_t store_threshold_limit(struct threshold_block *b,
const char *buf, size_t count)
static ssize_t
store_threshold_limit(struct threshold_block *b, const char *buf, size_t size)
{
char *end;
struct thresh_restart tr;
unsigned long new = simple_strtoul(buf, &end, 0);
if (end == buf)
unsigned long new;
if (strict_strtoul(buf, 0, &new) < 0)
return -EINVAL;
if (new > THRESHOLD_MAX)
new = THRESHOLD_MAX;
if (new < 1)
new = 1;
tr.old_limit = b->threshold_limit;
b->threshold_limit = new;
tr.b = b;
......@@ -303,7 +309,7 @@ static ssize_t store_threshold_limit(struct threshold_block *b,
smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1);
return end - buf;
return size;
}
struct threshold_block_cross_cpu {
......@@ -338,16 +344,13 @@ static ssize_t store_error_count(struct threshold_block *b,
return 1;
}
#define THRESHOLD_ATTR(_name,_mode,_show,_store) { \
.attr = {.name = __stringify(_name), .mode = _mode }, \
.show = _show, \
.store = _store, \
#define RW_ATTR(val) \
static struct threshold_attr val = { \
.attr = {.name = __stringify(val), .mode = 0644 }, \
.show = show_## val, \
.store = store_## val, \
};
#define RW_ATTR(name) \
static struct threshold_attr name = \
THRESHOLD_ATTR(name, 0644, show_## name, store_## name)
RW_ATTR(interrupt_enable);
RW_ATTR(threshold_limit);
RW_ATTR(error_count);
......@@ -367,7 +370,9 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
struct threshold_block *b = to_block(kobj);
struct threshold_attr *a = to_attr(attr);
ssize_t ret;
ret = a->show ? a->show(b, buf) : -EIO;
return ret;
}
......@@ -377,7 +382,9 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr,
struct threshold_block *b = to_block(kobj);
struct threshold_attr *a = to_attr(attr);
ssize_t ret;
ret = a->store ? a->store(b, buf, count) : -EIO;
return ret;
}
......@@ -396,9 +403,9 @@ static __cpuinit int allocate_threshold_blocks(unsigned int cpu,
unsigned int block,
u32 address)
{
int err;
u32 low, high;
struct threshold_block *b = NULL;
u32 low, high;
int err;
if ((bank >= NR_BANKS) || (block >= NR_BLOCKS))
return 0;
......@@ -430,11 +437,12 @@ static __cpuinit int allocate_threshold_blocks(unsigned int cpu,
INIT_LIST_HEAD(&b->miscj);
if (per_cpu(threshold_banks, cpu)[bank]->blocks)
if (per_cpu(threshold_banks, cpu)[bank]->blocks) {
list_add(&b->miscj,
&per_cpu(threshold_banks, cpu)[bank]->blocks->miscj);
else
} else {
per_cpu(threshold_banks, cpu)[bank]->blocks = b;
}
err = kobject_init_and_add(&b->kobj, &threshold_ktype,
per_cpu(threshold_banks, cpu)[bank]->kobj,
......@@ -447,8 +455,9 @@ static __cpuinit int allocate_threshold_blocks(unsigned int cpu,
if (!address)
return 0;
address += MCG_XBLK_ADDR;
} else
} else {
++address;
}
err = allocate_threshold_blocks(cpu, bank, ++block, address);
if (err)
......@@ -500,13 +509,14 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
if (!b)
goto out;
err = sysfs_create_link(&per_cpu(device_mce, cpu).kobj,
err = sysfs_create_link(&per_cpu(mce_dev, cpu).kobj,
b->kobj, name);
if (err)
goto out;
cpumask_copy(b->cpus, cpu_core_mask(cpu));
per_cpu(threshold_banks, cpu)[bank] = b;
goto out;
}
#endif
......@@ -522,7 +532,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
goto out;
}
b->kobj = kobject_create_and_add(name, &per_cpu(device_mce, cpu).kobj);
b->kobj = kobject_create_and_add(name, &per_cpu(mce_dev, cpu).kobj);
if (!b->kobj)
goto out_free;
......@@ -542,7 +552,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
if (i == cpu)
continue;
err = sysfs_create_link(&per_cpu(device_mce, i).kobj,
err = sysfs_create_link(&per_cpu(mce_dev, i).kobj,
b->kobj, name);
if (err)
goto out;
......@@ -605,15 +615,13 @@ static void deallocate_threshold_block(unsigned int cpu,
static void threshold_remove_bank(unsigned int cpu, int bank)
{
int i = 0;
struct threshold_bank *b;
char name[32];
int i = 0;
b = per_cpu(threshold_banks, cpu)[bank];
if (!b)
return;
if (!b->blocks)
goto free_out;
......@@ -622,8 +630,9 @@ static void threshold_remove_bank(unsigned int cpu, int bank)
#ifdef CONFIG_SMP
/* sibling symlink */
if (shared_bank[bank] && b->blocks->cpu != cpu) {
sysfs_remove_link(&per_cpu(device_mce, cpu).kobj, name);
sysfs_remove_link(&per_cpu(mce_dev, cpu).kobj, name);
per_cpu(threshold_banks, cpu)[bank] = NULL;
return;
}
#endif
......@@ -633,7 +642,7 @@ static void threshold_remove_bank(unsigned int cpu, int bank)
if (i == cpu)
continue;
sysfs_remove_link(&per_cpu(device_mce, i).kobj, name);
sysfs_remove_link(&per_cpu(mce_dev, i).kobj, name);
per_cpu(threshold_banks, i)[bank] = NULL;
}
......@@ -659,12 +668,9 @@ static void threshold_remove_device(unsigned int cpu)
}
/* get notified when a cpu comes on/off */
static void __cpuinit amd_64_threshold_cpu_callback(unsigned long action,
unsigned int cpu)
static void __cpuinit
amd_64_threshold_cpu_callback(unsigned long action, unsigned int cpu)
{
if (cpu >= NR_CPUS)
return;
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
......@@ -686,11 +692,12 @@ static __init int threshold_init_device(void)
/* to hit CPUs online before the notifier is up */
for_each_online_cpu(lcpu) {
int err = threshold_create_device(lcpu);
if (err)
return err;
}
threshold_cpu_callback = amd_64_threshold_cpu_callback;
return 0;
}
device_initcall(threshold_init_device);
/*
* Common code for Intel machine checks
*/
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/therm_throt.h>
#include <asm/processor.h>
#include <asm/system.h>
#include <asm/apic.h>
#include <asm/msr.h>
#include "mce.h"
void intel_init_thermal(struct cpuinfo_x86 *c)
{
unsigned int cpu = smp_processor_id();
int tm2 = 0;
u32 l, h;
/* Thermal monitoring depends on ACPI and clock modulation*/
if (!cpu_has(c, X86_FEATURE_ACPI) || !cpu_has(c, X86_FEATURE_ACC))
return;
/*
* First check if its enabled already, in which case there might
* be some SMM goo which handles it, so we can't even put a handler
* since it might be delivered via SMI already:
*/
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
h = apic_read(APIC_LVTTHMR);
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
printk(KERN_DEBUG
"CPU%d: Thermal monitoring handled by SMI\n", cpu);
return;
}
if (cpu_has(c, X86_FEATURE_TM2) && (l & MSR_IA32_MISC_ENABLE_TM2))
tm2 = 1;
/* Check whether a vector already exists */
if (h & APIC_VECTOR_MASK) {
printk(KERN_DEBUG
"CPU%d: Thermal LVT vector (%#x) already installed\n",
cpu, (h & APIC_VECTOR_MASK));
return;
}
/* We'll mask the thermal vector in the lapic till we're ready: */
h = THERMAL_APIC_VECTOR | APIC_DM_FIXED | APIC_LVT_MASKED;
apic_write(APIC_LVTTHMR, h);
rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
wrmsr(MSR_IA32_THERM_INTERRUPT,
l | (THERM_INT_LOW_ENABLE | THERM_INT_HIGH_ENABLE), h);
intel_set_thermal_handler();
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
/* Unmask the thermal vector: */
l = apic_read(APIC_LVTTHMR);
apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
printk(KERN_INFO "CPU%d: Thermal monitoring enabled (%s)\n",
cpu, tm2 ? "TM2" : "TM1");
/* enable thermal throttle processing */
atomic_set(&therm_throt_en, 1);
}
......@@ -16,6 +16,8 @@
#include <asm/idle.h>
#include <asm/therm_throt.h>
#include "mce.h"
asmlinkage void smp_thermal_interrupt(void)
{
__u64 msr_val;
......@@ -26,67 +28,13 @@ asmlinkage void smp_thermal_interrupt(void)
irq_enter();
rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
if (therm_throt_process(msr_val & 1))
if (therm_throt_process(msr_val & THERM_STATUS_PROCHOT))
mce_log_therm_throt_event(msr_val);
inc_irq_stat(irq_thermal_count);
irq_exit();
}
static void intel_init_thermal(struct cpuinfo_x86 *c)
{
u32 l, h;
int tm2 = 0;
unsigned int cpu = smp_processor_id();
if (!cpu_has(c, X86_FEATURE_ACPI))
return;
if (!cpu_has(c, X86_FEATURE_ACC))
return;
/* first check if TM1 is already enabled by the BIOS, in which
* case there might be some SMM goo which handles it, so we can't even
* put a handler since it might be delivered via SMI already.
*/
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
h = apic_read(APIC_LVTTHMR);
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
printk(KERN_DEBUG
"CPU%d: Thermal monitoring handled by SMI\n", cpu);
return;
}
if (cpu_has(c, X86_FEATURE_TM2) && (l & MSR_IA32_MISC_ENABLE_TM2))
tm2 = 1;
if (h & APIC_VECTOR_MASK) {
printk(KERN_DEBUG
"CPU%d: Thermal LVT vector (%#x) already "
"installed\n", cpu, (h & APIC_VECTOR_MASK));
return;
}
h = THERMAL_APIC_VECTOR;
h |= (APIC_DM_FIXED | APIC_LVT_MASKED);
apic_write(APIC_LVTTHMR, h);
rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
wrmsr(MSR_IA32_THERM_INTERRUPT, l | 0x03, h);
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
l = apic_read(APIC_LVTTHMR);
apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
printk(KERN_INFO "CPU%d: Thermal monitoring enabled (%s)\n",
cpu, tm2 ? "TM2" : "TM1");
/* enable thermal throttle processing */
atomic_set(&therm_throt_en, 1);
return;
}
/*
* Support for Intel Correct Machine Check Interrupts. This allows
* the CPU to raise an interrupt when a corrected machine check happened.
......@@ -108,6 +56,9 @@ static int cmci_supported(int *banks)
{
u64 cap;
if (mce_cmci_disabled || mce_ignore_ce)
return 0;
/*
* Vendor check is not strictly needed, but the initial
* initialization is vendor keyed and this
......@@ -131,7 +82,7 @@ static int cmci_supported(int *banks)
static void intel_threshold_interrupt(void)
{
machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
mce_notify_user();
mce_notify_irq();
}
static void print_update(char *type, int *hdr, int num)
......@@ -247,7 +198,7 @@ void cmci_rediscover(int dying)
return;
cpumask_copy(old, &current->cpus_allowed);
for_each_online_cpu (cpu) {
for_each_online_cpu(cpu) {
if (cpu == dying)
continue;
if (set_cpus_allowed_ptr(current, cpumask_of(cpu)))
......
......@@ -6,15 +6,14 @@
* This file contains routines to check for non-fatal MCEs every 15s
*
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/jiffies.h>
#include <linux/workqueue.h>
#include <linux/interrupt.h>
#include <linux/smp.h>
#include <linux/workqueue.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/processor.h>
#include <asm/system.h>
......@@ -24,7 +23,7 @@
static int firstbank;
#define MCE_RATE 15*HZ /* timer rate is 15s */
#define MCE_RATE (15*HZ) /* timer rate is 15s */
static void mce_checkregs(void *info)
{
......@@ -34,24 +33,25 @@ static void mce_checkregs(void *info)
for (i = firstbank; i < nr_mce_banks; i++) {
rdmsr(MSR_IA32_MC0_STATUS+i*4, low, high);
if (high & (1<<31)) {
printk(KERN_INFO "MCE: The hardware reports a non "
"fatal, correctable incident occurred on "
"CPU %d.\n",
if (!(high & (1<<31)))
continue;
printk(KERN_INFO "MCE: The hardware reports a non fatal, "
"correctable incident occurred on CPU %d.\n",
smp_processor_id());
printk(KERN_INFO "Bank %d: %08x%08x\n", i, high, low);
/*
* Scrub the error so we don't pick it up in MCE_RATE
* seconds time.
* seconds time:
*/
wrmsr(MSR_IA32_MC0_STATUS+i*4, 0UL, 0UL);
/* Serialize */
/* Serialize: */
wmb();
add_taint(TAINT_MACHINE_CHECK);
}
}
}
static void mce_work_fn(struct work_struct *work);
......@@ -87,6 +87,7 @@ static int __init init_nonfatal_mce_checker(void)
*/
schedule_delayed_work(&mce_work, round_jiffies_relative(MCE_RATE));
printk(KERN_INFO "Machine check exception polling timer started.\n");
return 0;
}
module_init(init_nonfatal_mce_checker);
......
......@@ -2,18 +2,17 @@
* P4 specific Machine Check Exception Reporting
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/therm_throt.h>
#include <asm/processor.h>
#include <asm/system.h>
#include <asm/msr.h>
#include <asm/apic.h>
#include <asm/therm_throt.h>
#include <asm/msr.h>
#include "mce.h"
......@@ -36,6 +35,7 @@ static int mce_num_extended_msrs;
#ifdef CONFIG_X86_MCE_P4THERMAL
static void unexpected_thermal_interrupt(struct pt_regs *regs)
{
printk(KERN_ERR "CPU%d: Unexpected LVT TMR interrupt!\n",
......@@ -43,7 +43,7 @@ static void unexpected_thermal_interrupt(struct pt_regs *regs)
add_taint(TAINT_MACHINE_CHECK);
}
/* P4/Xeon Thermal transition interrupt handler */
/* P4/Xeon Thermal transition interrupt handler: */
static void intel_thermal_interrupt(struct pt_regs *regs)
{
__u64 msr_val;
......@@ -51,11 +51,12 @@ static void intel_thermal_interrupt(struct pt_regs *regs)
ack_APIC_irq();
rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
therm_throt_process(msr_val & 0x1);
therm_throt_process(msr_val & THERM_STATUS_PROCHOT);
}
/* Thermal interrupt handler for this CPU setup */
static void (*vendor_thermal_interrupt)(struct pt_regs *regs) = unexpected_thermal_interrupt;
/* Thermal interrupt handler for this CPU setup: */
static void (*vendor_thermal_interrupt)(struct pt_regs *regs) =
unexpected_thermal_interrupt;
void smp_thermal_interrupt(struct pt_regs *regs)
{
......@@ -65,67 +66,15 @@ void smp_thermal_interrupt(struct pt_regs *regs)
irq_exit();
}
/* P4/Xeon Thermal regulation detect and init */
static void intel_init_thermal(struct cpuinfo_x86 *c)
void intel_set_thermal_handler(void)
{
u32 l, h;
unsigned int cpu = smp_processor_id();
/* Thermal monitoring */
if (!cpu_has(c, X86_FEATURE_ACPI))
return; /* -ENODEV */
/* Clock modulation */
if (!cpu_has(c, X86_FEATURE_ACC))
return; /* -ENODEV */
/* first check if its enabled already, in which case there might
* be some SMM goo which handles it, so we can't even put a handler
* since it might be delivered via SMI already -zwanem.
*/
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
h = apic_read(APIC_LVTTHMR);
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
printk(KERN_DEBUG "CPU%d: Thermal monitoring handled by SMI\n",
cpu);
return; /* -EBUSY */
}
/* check whether a vector already exists, temporarily masked? */
if (h & APIC_VECTOR_MASK) {
printk(KERN_DEBUG "CPU%d: Thermal LVT vector (%#x) already "
"installed\n",
cpu, (h & APIC_VECTOR_MASK));
return; /* -EBUSY */
}
/* The temperature transition interrupt handler setup */
h = THERMAL_APIC_VECTOR; /* our delivery vector */
h |= (APIC_DM_FIXED | APIC_LVT_MASKED); /* we'll mask till we're ready */
apic_write(APIC_LVTTHMR, h);
rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
wrmsr(MSR_IA32_THERM_INTERRUPT, l | 0x03 , h);
/* ok we're good to go... */
vendor_thermal_interrupt = intel_thermal_interrupt;
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
l = apic_read(APIC_LVTTHMR);
apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
printk(KERN_INFO "CPU%d: Thermal monitoring enabled\n", cpu);
/* enable thermal throttle processing */
atomic_set(&therm_throt_en, 1);
return;
}
#endif /* CONFIG_X86_MCE_P4THERMAL */
#endif /* CONFIG_X86_MCE_P4THERMAL */
/* P4/Xeon Extended MCE MSR retrieval, return 0 if unsupported */
static inline void intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
static void intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
{
u32 h;
......@@ -143,9 +92,9 @@ static inline void intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
static void intel_machine_check(struct pt_regs *regs, long error_code)
{
int recover = 1;
u32 alow, ahigh, high, low;
u32 mcgstl, mcgsth;
int recover = 1;
int i;
rdmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
......@@ -157,7 +106,9 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
if (mce_num_extended_msrs > 0) {
struct intel_mce_extended_msrs dbg;
intel_get_extended_msrs(&dbg);
printk(KERN_DEBUG "CPU %d: EIP: %08x EFLAGS: %08x\n"
"\teax: %08x ebx: %08x ecx: %08x edx: %08x\n"
"\tesi: %08x edi: %08x ebp: %08x esp: %08x\n",
......@@ -171,6 +122,7 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
if (high & (1<<31)) {
char misc[20];
char addr[24];
misc[0] = addr[0] = '\0';
if (high & (1<<29))
recover |= 1;
......@@ -196,6 +148,7 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
panic("Unable to continue");
printk(KERN_EMERG "Attempting to continue.\n");
/*
* Do not clear the MSR_IA32_MCi_STATUS if the error is not
* recoverable/continuable.This will allow BIOS to look at the MSRs
......@@ -217,7 +170,6 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
wrmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
}
void intel_p4_mcheck_init(struct cpuinfo_x86 *c)
{
u32 l, h;
......
......@@ -2,11 +2,10 @@
* P5 specific Machine Check Exception Reporting
* (C) Copyright 2002 Alan Cox <alan@lxorguk.ukuu.org.uk>
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/processor.h>
......@@ -15,39 +14,58 @@
#include "mce.h"
/* Machine check handler for Pentium class Intel */
/* By default disabled */
int mce_p5_enable;
/* Machine check handler for Pentium class Intel CPUs: */
static void pentium_machine_check(struct pt_regs *regs, long error_code)
{
u32 loaddr, hi, lotype;
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
printk(KERN_EMERG "CPU#%d: Machine Check Exception: 0x%8X (type 0x%8X).\n", smp_processor_id(), loaddr, lotype);
if (lotype&(1<<5))
printk(KERN_EMERG "CPU#%d: Possible thermal failure (CPU on fire ?).\n", smp_processor_id());
printk(KERN_EMERG
"CPU#%d: Machine Check Exception: 0x%8X (type 0x%8X).\n",
smp_processor_id(), loaddr, lotype);
if (lotype & (1<<5)) {
printk(KERN_EMERG
"CPU#%d: Possible thermal failure (CPU on fire ?).\n",
smp_processor_id());
}
add_taint(TAINT_MACHINE_CHECK);
}
/* Set up machine check reporting for processors with Intel style MCE */
/* Set up machine check reporting for processors with Intel style MCE: */
void intel_p5_mcheck_init(struct cpuinfo_x86 *c)
{
u32 l, h;
/*Check for MCE support */
/* Check for MCE support: */
if (!cpu_has(c, X86_FEATURE_MCE))
return;
/* Default P5 to off as its often misconnected */
#ifdef CONFIG_X86_OLD_MCE
/* Default P5 to off as its often misconnected: */
if (mce_disabled != -1)
return;
#endif
machine_check_vector = pentium_machine_check;
/* Make sure the vector pointer is visible before we enable MCEs: */
wmb();
/* Read registers before enabling */
/* Read registers before enabling: */
rdmsr(MSR_IA32_P5_MC_ADDR, l, h);
rdmsr(MSR_IA32_P5_MC_TYPE, l, h);
printk(KERN_INFO "Intel old style machine check architecture supported.\n");
printk(KERN_INFO
"Intel old style machine check architecture supported.\n");
/* Enable MCE */
/* Enable MCE: */
set_in_cr4(X86_CR4_MCE);
printk(KERN_INFO "Intel old style machine check reporting enabled on CPU#%d.\n", smp_processor_id());
printk(KERN_INFO
"Intel old style machine check reporting enabled on CPU#%d.\n",
smp_processor_id());
}
......@@ -2,11 +2,10 @@
* P6 specific Machine Check Exception Reporting
* (C) Copyright 2002 Alan Cox <alan@lxorguk.ukuu.org.uk>
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/processor.h>
......@@ -18,9 +17,9 @@
/* Machine Check Handler For PII/PIII */
static void intel_machine_check(struct pt_regs *regs, long error_code)
{
int recover = 1;
u32 alow, ahigh, high, low;
u32 mcgstl, mcgsth;
int recover = 1;
int i;
rdmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
......@@ -35,12 +34,16 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
if (high & (1<<31)) {
char misc[20];
char addr[24];
misc[0] = addr[0] = '\0';
misc[0] = '\0';
addr[0] = '\0';
if (high & (1<<29))
recover |= 1;
if (high & (1<<25))
recover |= 2;
high &= ~(1<<31);
if (high & (1<<27)) {
rdmsr(MSR_IA32_MC0_MISC+i*4, alow, ahigh);
snprintf(misc, 20, "[%08x%08x]", ahigh, alow);
......@@ -49,6 +52,7 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
rdmsr(MSR_IA32_MC0_ADDR+i*4, alow, ahigh);
snprintf(addr, 24, " at %08x%08x", ahigh, alow);
}
printk(KERN_EMERG "CPU %d: Bank %d: %08x%08x%s%s\n",
smp_processor_id(), i, high, low, misc, addr);
}
......@@ -63,16 +67,17 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
/*
* Do not clear the MSR_IA32_MCi_STATUS if the error is not
* recoverable/continuable.This will allow BIOS to look at the MSRs
* for errors if the OS could not log the error.
* for errors if the OS could not log the error:
*/
for (i = 0; i < nr_mce_banks; i++) {
unsigned int msr;
msr = MSR_IA32_MC0_STATUS+i*4;
rdmsr(msr, low, high);
if (high & (1<<31)) {
/* Clear it */
/* Clear it: */
wrmsr(msr, 0UL, 0UL);
/* Serialize */
/* Serialize: */
wmb();
add_taint(TAINT_MACHINE_CHECK);
}
......@@ -81,7 +86,7 @@ static void intel_machine_check(struct pt_regs *regs, long error_code)
wrmsr(MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
}
/* Set up machine check reporting for processors with Intel style MCE */
/* Set up machine check reporting for processors with Intel style MCE: */
void intel_p6_mcheck_init(struct cpuinfo_x86 *c)
{
u32 l, h;
......@@ -97,6 +102,7 @@ void intel_p6_mcheck_init(struct cpuinfo_x86 *c)
/* Ok machine check is available */
machine_check_vector = intel_machine_check;
/* Make sure the vector pointer is visible before we enable MCEs: */
wmb();
printk(KERN_INFO "Intel machine check architecture supported.\n");
......
/*
*
* Thermal throttle event support code (such as syslog messaging and rate
* limiting) that was factored out from x86_64 (mce_intel.c) and i386 (p4.c).
*
* This allows consistent reporting of CPU thermal throttle events.
*
* Maintains a counter in /sys that keeps track of the number of thermal
......@@ -13,13 +13,12 @@
* Credits: Adapted from Zwane Mwaikambo's original code in mce_intel.c.
* Inspired by Ross Biro's and Al Borchers' counter code.
*/
#include <linux/notifier.h>
#include <linux/jiffies.h>
#include <linux/percpu.h>
#include <linux/sysdev.h>
#include <linux/cpu.h>
#include <asm/cpu.h>
#include <linux/notifier.h>
#include <linux/jiffies.h>
#include <asm/therm_throt.h>
/* How long to wait between reporting thermal events */
......@@ -27,6 +26,7 @@
static DEFINE_PER_CPU(__u64, next_check) = INITIAL_JIFFIES;
static DEFINE_PER_CPU(unsigned long, thermal_throttle_count);
atomic_t therm_throt_en = ATOMIC_INIT(0);
#ifdef CONFIG_SYSFS
......@@ -110,10 +110,11 @@ int therm_throt_process(int curr)
}
#ifdef CONFIG_SYSFS
/* Add/Remove thermal_throttle interface for CPU device */
/* Add/Remove thermal_throttle interface for CPU device: */
static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev)
{
return sysfs_create_group(&sys_dev->kobj, &thermal_throttle_attr_group);
return sysfs_create_group(&sys_dev->kobj,
&thermal_throttle_attr_group);
}
static __cpuinit void thermal_throttle_remove_dev(struct sys_device *sys_dev)
......@@ -121,11 +122,12 @@ static __cpuinit void thermal_throttle_remove_dev(struct sys_device *sys_dev)
sysfs_remove_group(&sys_dev->kobj, &thermal_throttle_attr_group);
}
/* Mutex protecting device creation against CPU hotplug */
/* Mutex protecting device creation against CPU hotplug: */
static DEFINE_MUTEX(therm_cpu_lock);
/* Get notified when a cpu comes on/off. Be hotplug friendly. */
static __cpuinit int thermal_throttle_cpu_callback(struct notifier_block *nfb,
static __cpuinit int
thermal_throttle_cpu_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
{
......@@ -134,6 +136,7 @@ static __cpuinit int thermal_throttle_cpu_callback(struct notifier_block *nfb,
int err = 0;
sys_dev = get_cpu_sysdev(cpu);
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
......
......@@ -17,7 +17,7 @@ static void default_threshold_interrupt(void)
void (*mce_threshold_vector)(void) = default_threshold_interrupt;
asmlinkage void mce_threshold_interrupt(void)
asmlinkage void smp_threshold_interrupt(void)
{
exit_idle();
irq_enter();
......
......@@ -2,11 +2,10 @@
* IDT Winchip specific Machine Check Exception Reporting
* (C) Copyright 2002 Alan Cox <alan@lxorguk.ukuu.org.uk>
*/
#include <linux/init.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <asm/processor.h>
#include <asm/system.h>
......@@ -14,7 +13,7 @@
#include "mce.h"
/* Machine check handler for WinChip C6 */
/* Machine check handler for WinChip C6: */
static void winchip_machine_check(struct pt_regs *regs, long error_code)
{
printk(KERN_EMERG "CPU0: Machine Check Exception.\n");
......@@ -25,12 +24,18 @@ static void winchip_machine_check(struct pt_regs *regs, long error_code)
void winchip_mcheck_init(struct cpuinfo_x86 *c)
{
u32 lo, hi;
machine_check_vector = winchip_machine_check;
/* Make sure the vector pointer is visible before we enable MCEs: */
wmb();
rdmsr(MSR_IDT_FCR1, lo, hi);
lo |= (1<<2); /* Enable EIERRINT (int 18 MCE) */
lo &= ~(1<<4); /* Enable MCE */
wrmsr(MSR_IDT_FCR1, lo, hi);
set_in_cr4(X86_CR4_MCE);
printk(KERN_INFO "Winchip machine check reporting enabled on CPU#0.\n");
printk(KERN_INFO
"Winchip machine check reporting enabled on CPU#0.\n");
}
......@@ -963,6 +963,8 @@ END(\sym)
#ifdef CONFIG_SMP
apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \
irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
apicinterrupt REBOOT_VECTOR \
reboot_interrupt smp_reboot_interrupt
#endif
#ifdef CONFIG_X86_UV
......@@ -994,10 +996,15 @@ apicinterrupt INVALIDATE_TLB_VECTOR_START+7 \
#endif
apicinterrupt THRESHOLD_APIC_VECTOR \
threshold_interrupt mce_threshold_interrupt
threshold_interrupt smp_threshold_interrupt
apicinterrupt THERMAL_APIC_VECTOR \
thermal_interrupt smp_thermal_interrupt
#ifdef CONFIG_X86_MCE
apicinterrupt MCE_SELF_VECTOR \
mce_self_interrupt smp_mce_self_interrupt
#endif
#ifdef CONFIG_SMP
apicinterrupt CALL_FUNCTION_SINGLE_VECTOR \
call_function_single_interrupt smp_call_function_single_interrupt
......@@ -1379,7 +1386,7 @@ errorentry xen_stack_segment do_stack_segment
errorentry general_protection do_general_protection
errorentry page_fault do_page_fault
#ifdef CONFIG_X86_MCE
paranoidzeroentry machine_check do_machine_check
paranoidzeroentry machine_check *machine_check_vector(%rip)
#endif
/*
......
......@@ -12,6 +12,7 @@
#include <asm/io_apic.h>
#include <asm/irq.h>
#include <asm/idle.h>
#include <asm/mce.h>
#include <asm/hw_irq.h>
atomic_t irq_err_count;
......@@ -96,12 +97,22 @@ static int show_other_interrupts(struct seq_file *p, int prec)
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count);
seq_printf(p, " Thermal event interrupts\n");
# ifdef CONFIG_X86_64
# ifdef CONFIG_X86_MCE_THRESHOLD
seq_printf(p, "%*s: ", prec, "THR");
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
seq_printf(p, " Threshold APIC interrupts\n");
# endif
#endif
#ifdef CONFIG_X86_NEW_MCE
seq_printf(p, "%*s: ", prec, "MCE");
for_each_online_cpu(j)
seq_printf(p, "%10u ", per_cpu(mce_exception_count, j));
seq_printf(p, " Machine check exceptions\n");
seq_printf(p, "%*s: ", prec, "MCP");
for_each_online_cpu(j)
seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
seq_printf(p, " Machine check polls\n");
#endif
seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
#if defined(CONFIG_X86_IO_APIC)
......@@ -185,9 +196,13 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
#endif
#ifdef CONFIG_X86_MCE
sum += irq_stats(cpu)->irq_thermal_count;
# ifdef CONFIG_X86_64
# ifdef CONFIG_X86_MCE_THRESHOLD
sum += irq_stats(cpu)->irq_threshold_count;
# endif
#endif
#ifdef CONFIG_X86_NEW_MCE
sum += per_cpu(mce_exception_count, cpu);
sum += per_cpu(mce_poll_count, cpu);
#endif
return sum;
}
......
......@@ -173,6 +173,9 @@ static void __init smp_intr_init(void)
/* Low priority IPI to cleanup after moving an irq */
set_intr_gate(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt);
set_bit(IRQ_MOVE_CLEANUP_VECTOR, used_vectors);
/* IPI used for rebooting/stopping */
alloc_intr_gate(REBOOT_VECTOR, reboot_interrupt);
#endif
#endif /* CONFIG_SMP */
}
......
......@@ -24,11 +24,11 @@
#include <asm/ucontext.h>
#include <asm/i387.h>
#include <asm/vdso.h>
#include <asm/mce.h>
#ifdef CONFIG_X86_64
#include <asm/proto.h>
#include <asm/ia32_unistd.h>
#include <asm/mce.h>
#endif /* CONFIG_X86_64 */
#include <asm/syscall.h>
......@@ -856,10 +856,10 @@ static void do_signal(struct pt_regs *regs)
void
do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
{
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE)
#ifdef CONFIG_X86_NEW_MCE
/* notify userspace of pending MCEs */
if (thread_info_flags & _TIF_MCE_NOTIFY)
mce_notify_user();
mce_notify_process();
#endif /* CONFIG_X86_64 && CONFIG_X86_MCE */
/* deal with pending signal delivery */
......
......@@ -150,14 +150,40 @@ void native_send_call_func_ipi(const struct cpumask *mask)
* this function calls the 'stop' function on all other CPUs in the system.
*/
asmlinkage void smp_reboot_interrupt(void)
{
ack_APIC_irq();
irq_enter();
stop_this_cpu(NULL);
irq_exit();
}
static void native_smp_send_stop(void)
{
unsigned long flags;
unsigned long wait;
if (reboot_force)
return;
smp_call_function(stop_this_cpu, NULL, 0);
/*
* Use an own vector here because smp_call_function
* does lots of things not suitable in a panic situation.
* On most systems we could also use an NMI here,
* but there are a few systems around where NMI
* is problematic so stay with an non NMI for now
* (this implies we cannot stop CPUs spinning with irq off
* currently)
*/
if (num_online_cpus() > 1) {
apic->send_IPI_allbutself(REBOOT_VECTOR);
/* Don't wait longer than a second */
wait = USEC_PER_SEC;
while (num_online_cpus() > 1 && wait--)
udelay(1);
}
local_irq_save(flags);
disable_local_APIC();
local_irq_restore(flags);
......
......@@ -798,15 +798,15 @@ unsigned long patch_espfix_desc(unsigned long uesp, unsigned long kesp)
return new_kesp;
}
#else
#endif
asmlinkage void __attribute__((weak)) smp_thermal_interrupt(void)
{
}
asmlinkage void __attribute__((weak)) mce_threshold_interrupt(void)
asmlinkage void __attribute__((weak)) smp_threshold_interrupt(void)
{
}
#endif
/*
* 'math_state_restore()' saves the current math information in the
......
......@@ -757,6 +757,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
wake_up_idle_cpu(cpu);
spin_unlock_irqrestore(&base->lock, flags);
}
EXPORT_SYMBOL_GPL(add_timer_on);
/**
* del_timer - deactive a timer.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment