Commit 195daf66 authored by Ulrich Obergfell's avatar Ulrich Obergfell Committed by Linus Torvalds

watchdog: enable the new user interface of the watchdog mechanism

With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually.  With this
series applied, the user interface is as follows.

- parameters in /proc/sys/kernel

  . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

  . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

  . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

  . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

- kernel command line parameters

  . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

  . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

  . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

Also, remove the proc_dowatchdog() function which is no longer needed.

[dzickus@redhat.com: wrote changelog]
[dzickus@redhat.com: update documentation for kernel params and sysctl]
Signed-off-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent bcfba4f4
...@@ -2236,8 +2236,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -2236,8 +2236,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
Format: [panic,][nopanic,][num] Format: [panic,][nopanic,][num]
Valid num: 0 Valid num: 0 or 1
0 - turn nmi_watchdog off 0 - turn nmi_watchdog off
1 - turn nmi_watchdog on
When panic is specified, panic when an NMI watchdog When panic is specified, panic when an NMI watchdog
timeout occurs (or 'nopanic' to override the opposite timeout occurs (or 'nopanic' to override the opposite
default). default).
...@@ -2464,7 +2465,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -2464,7 +2465,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nousb [USB] Disable the USB subsystem nousb [USB] Disable the USB subsystem
nowatchdog [KNL] Disable the lockup detector (NMI watchdog). nowatchdog [KNL] Disable both lockup detectors, i.e.
soft-lockup and NMI watchdog (hard-lockup).
nowb [ARM] nowb [ARM]
......
...@@ -77,12 +77,14 @@ show up in /proc/sys/kernel: ...@@ -77,12 +77,14 @@ show up in /proc/sys/kernel:
- shmmax [ sysv ipc ] - shmmax [ sysv ipc ]
- shmmni - shmmni
- softlockup_all_cpu_backtrace - softlockup_all_cpu_backtrace
- soft_watchdog
- stop-a [ SPARC only ] - stop-a [ SPARC only ]
- sysrq ==> Documentation/sysrq.txt - sysrq ==> Documentation/sysrq.txt
- sysctl_writes_strict - sysctl_writes_strict
- tainted - tainted
- threads-max - threads-max
- unknown_nmi_panic - unknown_nmi_panic
- watchdog
- watchdog_thresh - watchdog_thresh
- version - version
...@@ -417,16 +419,23 @@ successful IPC object allocation. ...@@ -417,16 +419,23 @@ successful IPC object allocation.
nmi_watchdog: nmi_watchdog:
Enables/Disables the NMI watchdog on x86 systems. When the value is This parameter can be used to control the NMI watchdog
non-zero the NMI watchdog is enabled and will continuously test all (i.e. the hard lockup detector) on x86 systems.
online cpus to determine whether or not they are still functioning
properly. Currently, passing "nmi_watchdog=" parameter at boot time is
required for this function to work.
If LAPIC NMI watchdog method is in use (nmi_watchdog=2 kernel 0 - disable the hard lockup detector
parameter), the NMI watchdog shares registers with oprofile. By 1 - enable the hard lockup detector
disabling the NMI watchdog, oprofile may have more registers to
utilize. The hard lockup detector monitors each CPU for its ability to respond to
timer interrupts. The mechanism utilizes CPU performance counter registers
that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
The NMI watchdog is disabled by default if the kernel is running as a guest
in a KVM virtual machine. This default can be overridden by adding
nmi_watchdog=1
to the guest kernel command line (see Documentation/kernel-parameters.txt).
============================================================== ==============================================================
...@@ -816,6 +825,22 @@ NMI. ...@@ -816,6 +825,22 @@ NMI.
============================================================== ==============================================================
soft_watchdog
This parameter can be used to control the soft lockup detector.
0 - disable the soft lockup detector
1 - enable the soft lockup detector
The soft lockup detector monitors CPUs for threads that are hogging the CPUs
without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
from running. The mechanism depends on the CPUs ability to respond to timer
interrupts which are needed for the 'watchdog/N' threads to be woken up by
the watchdog timer function, otherwise the NMI watchdog - if enabled - can
detect a hard lockup condition.
==============================================================
tainted: tainted:
Non-zero if the kernel has been tainted. Numeric values, which Non-zero if the kernel has been tainted. Numeric values, which
...@@ -858,6 +883,25 @@ example. If a system hangs up, try pressing the NMI switch. ...@@ -858,6 +883,25 @@ example. If a system hangs up, try pressing the NMI switch.
============================================================== ==============================================================
watchdog:
This parameter can be used to disable or enable the soft lockup detector
_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
0 - disable both lockup detectors
1 - enable both lockup detectors
The soft lockup detector and the NMI watchdog can also be disabled or
enabled individually, using the soft_watchdog and nmi_watchdog parameters.
If the watchdog parameter is read, for example by executing
cat /proc/sys/kernel/watchdog
the output of this command (0 or 1) shows the logical OR of soft_watchdog
and nmi_watchdog.
==============================================================
watchdog_thresh: watchdog_thresh:
This value can be used to control the frequency of hrtimer and NMI This value can be used to control the frequency of hrtimer and NMI
......
...@@ -82,8 +82,6 @@ extern int proc_soft_watchdog(struct ctl_table *, int , ...@@ -82,8 +82,6 @@ extern int proc_soft_watchdog(struct ctl_table *, int ,
void __user *, size_t *, loff_t *); void __user *, size_t *, loff_t *);
extern int proc_watchdog_thresh(struct ctl_table *, int , extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *); void __user *, size_t *, loff_t *);
extern int proc_dowatchdog(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
#endif #endif
#ifdef CONFIG_HAVE_ACPI_APEI_NMI #ifdef CONFIG_HAVE_ACPI_APEI_NMI
......
...@@ -846,7 +846,7 @@ static struct ctl_table kern_table[] = { ...@@ -846,7 +846,7 @@ static struct ctl_table kern_table[] = {
.data = &watchdog_user_enabled, .data = &watchdog_user_enabled,
.maxlen = sizeof (int), .maxlen = sizeof (int),
.mode = 0644, .mode = 0644,
.proc_handler = proc_dowatchdog, .proc_handler = proc_watchdog,
.extra1 = &zero, .extra1 = &zero,
.extra2 = &one, .extra2 = &one,
}, },
...@@ -855,10 +855,32 @@ static struct ctl_table kern_table[] = { ...@@ -855,10 +855,32 @@ static struct ctl_table kern_table[] = {
.data = &watchdog_thresh, .data = &watchdog_thresh,
.maxlen = sizeof(int), .maxlen = sizeof(int),
.mode = 0644, .mode = 0644,
.proc_handler = proc_dowatchdog, .proc_handler = proc_watchdog_thresh,
.extra1 = &zero, .extra1 = &zero,
.extra2 = &sixty, .extra2 = &sixty,
}, },
{
.procname = "nmi_watchdog",
.data = &nmi_watchdog_enabled,
.maxlen = sizeof (int),
.mode = 0644,
.proc_handler = proc_nmi_watchdog,
.extra1 = &zero,
#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
.extra2 = &one,
#else
.extra2 = &zero,
#endif
},
{
.procname = "soft_watchdog",
.data = &soft_watchdog_enabled,
.maxlen = sizeof (int),
.mode = 0644,
.proc_handler = proc_soft_watchdog,
.extra1 = &zero,
.extra2 = &one,
},
{ {
.procname = "softlockup_panic", .procname = "softlockup_panic",
.data = &softlockup_panic, .data = &softlockup_panic,
...@@ -879,15 +901,6 @@ static struct ctl_table kern_table[] = { ...@@ -879,15 +901,6 @@ static struct ctl_table kern_table[] = {
.extra2 = &one, .extra2 = &one,
}, },
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
{
.procname = "nmi_watchdog",
.data = &watchdog_user_enabled,
.maxlen = sizeof (int),
.mode = 0644,
.proc_handler = proc_dowatchdog,
.extra1 = &zero,
.extra2 = &one,
},
#endif #endif
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
{ {
......
...@@ -110,15 +110,9 @@ static int __init hardlockup_panic_setup(char *str) ...@@ -110,15 +110,9 @@ static int __init hardlockup_panic_setup(char *str)
else if (!strncmp(str, "nopanic", 7)) else if (!strncmp(str, "nopanic", 7))
hardlockup_panic = 0; hardlockup_panic = 0;
else if (!strncmp(str, "0", 1)) else if (!strncmp(str, "0", 1))
watchdog_user_enabled = 0; watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
else if (!strncmp(str, "1", 1) || !strncmp(str, "2", 1)) { else if (!strncmp(str, "1", 1))
/* watchdog_enabled |= NMI_WATCHDOG_ENABLED;
* Setting 'nmi_watchdog=1' or 'nmi_watchdog=2' (legacy option)
* has the same effect.
*/
watchdog_user_enabled = 1;
watchdog_enable_hardlockup_detector(true);
}
return 1; return 1;
} }
__setup("nmi_watchdog=", hardlockup_panic_setup); __setup("nmi_watchdog=", hardlockup_panic_setup);
...@@ -137,19 +131,18 @@ __setup("softlockup_panic=", softlockup_panic_setup); ...@@ -137,19 +131,18 @@ __setup("softlockup_panic=", softlockup_panic_setup);
static int __init nowatchdog_setup(char *str) static int __init nowatchdog_setup(char *str)
{ {
watchdog_user_enabled = 0; watchdog_enabled = 0;
return 1; return 1;
} }
__setup("nowatchdog", nowatchdog_setup); __setup("nowatchdog", nowatchdog_setup);
/* deprecated */
static int __init nosoftlockup_setup(char *str) static int __init nosoftlockup_setup(char *str)
{ {
watchdog_user_enabled = 0; watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED;
return 1; return 1;
} }
__setup("nosoftlockup", nosoftlockup_setup); __setup("nosoftlockup", nosoftlockup_setup);
/* */
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
static int __init softlockup_all_cpu_backtrace_setup(char *str) static int __init softlockup_all_cpu_backtrace_setup(char *str)
{ {
...@@ -264,10 +257,11 @@ static int is_softlockup(unsigned long touch_ts) ...@@ -264,10 +257,11 @@ static int is_softlockup(unsigned long touch_ts)
{ {
unsigned long now = get_timestamp(); unsigned long now = get_timestamp();
/* Warn about unreasonable delays: */ if (watchdog_enabled & SOFT_WATCHDOG_ENABLED) {
if (time_after(now, touch_ts + get_softlockup_thresh())) /* Warn about unreasonable delays. */
return now - touch_ts; if (time_after(now, touch_ts + get_softlockup_thresh()))
return now - touch_ts;
}
return 0; return 0;
} }
...@@ -532,6 +526,10 @@ static int watchdog_nmi_enable(unsigned int cpu) ...@@ -532,6 +526,10 @@ static int watchdog_nmi_enable(unsigned int cpu)
struct perf_event_attr *wd_attr; struct perf_event_attr *wd_attr;
struct perf_event *event = per_cpu(watchdog_ev, cpu); struct perf_event *event = per_cpu(watchdog_ev, cpu);
/* nothing to do if the hard lockup detector is disabled */
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
goto out;
/* /*
* Some kernels need to default hard lockup detection to * Some kernels need to default hard lockup detection to
* 'disabled', for example a guest on a hypervisor. * 'disabled', for example a guest on a hypervisor.
...@@ -856,59 +854,12 @@ int proc_watchdog_thresh(struct ctl_table *table, int write, ...@@ -856,59 +854,12 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
mutex_unlock(&watchdog_proc_mutex); mutex_unlock(&watchdog_proc_mutex);
return err; return err;
} }
/*
* proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
*/
int proc_dowatchdog(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int err, old_thresh, old_enabled;
bool old_hardlockup;
mutex_lock(&watchdog_proc_mutex);
old_thresh = ACCESS_ONCE(watchdog_thresh);
old_enabled = ACCESS_ONCE(watchdog_user_enabled);
old_hardlockup = watchdog_hardlockup_detector_is_enabled();
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (err || !write)
goto out;
set_sample_period();
/*
* Watchdog threads shouldn't be enabled if they are
* disabled. The 'watchdog_running' variable check in
* watchdog_*_all_cpus() function takes care of this.
*/
if (watchdog_user_enabled && watchdog_thresh) {
/*
* Prevent a change in watchdog_thresh accidentally overriding
* the enablement of the hardlockup detector.
*/
if (watchdog_user_enabled != old_enabled)
watchdog_enable_hardlockup_detector(true);
err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
} else
watchdog_disable_all_cpus();
/* Restore old values on failure */
if (err) {
watchdog_thresh = old_thresh;
watchdog_user_enabled = old_enabled;
watchdog_enable_hardlockup_detector(old_hardlockup);
}
out:
mutex_unlock(&watchdog_proc_mutex);
return err;
}
#endif /* CONFIG_SYSCTL */ #endif /* CONFIG_SYSCTL */
void __init lockup_detector_init(void) void __init lockup_detector_init(void)
{ {
set_sample_period(); set_sample_period();
if (watchdog_user_enabled) if (watchdog_enabled)
watchdog_enable_all_cpus(false); watchdog_enable_all_cpus(false);
} }
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment