printk: Reduce console_unblank() usage in unsafe scenarios

A semaphore is not NMI-safe, even when using down_trylock(). Both down_trylock() and up() are using internal spinlocks and up() might even call wake_up_process(). In the panic() code path it gets even worse because the internal spinlocks of the semaphore may have been taken by a CPU that has been stopped. To reduce the risk of deadlocks caused by the console semaphore in the panic path, make the following changes: - First check if any consoles have implemented the unblank() callback. If not, then there is no reason to take the console semaphore anyway. (This check is also useful for the non-panic path since the locking/unlocking of the console lock can be quite expensive due to console printing.) - If the panic path is in NMI context, bail out without attempting to take the console semaphore or calling any unblank() callbacks. Bailing out is acceptable because console_unblank() would already bail out if the console semaphore is contended. The alternative of ignoring the console semaphore and calling the unblank() callbacks anyway is a bad idea because these callbacks are also not NMI-safe. If consoles with unblank() callbacks exist and console_unblank() is called from a non-NMI panic context, it will still attempt a down_trylock(). This could still result in a deadlock if one of the stopped CPUs is holding the semaphore internal spinlock. But this is a risk that the kernel has been (and continues to be) willing to take. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230717194607.145135-3-john.ogness@linutronix.de

printk: Reduce console_unblank() usage in unsafe scenarios
A semaphore is not NMI-safe, even when using down_trylock(). Both down_trylock() and up() are using internal spinlocks and up() might even call wake_up_process(). In the panic() code path it gets even worse because the internal spinlocks of the semaphore may have been taken by a CPU that has been stopped. To reduce the risk of deadlocks caused by the console semaphore in the panic path, make the following changes: - First check if any consoles have implemented the unblank() callback. If not, then there is no reason to take the console semaphore anyway. (This check is also useful for the non-panic path since the locking/unlocking of the console lock can be quite expensive due to console printing.) - If the panic path is in NMI context, bail out without attempting to take the console semaphore or calling any unblank() callbacks. Bailing out is acceptable because console_unblank() would already bail out if the console semaphore is contended. The alternative of ignoring the console semaphore and calling the unblank() callbacks anyway is a bad idea because these callbacks are also not NMI-safe. If consoles with unblank() callbacks exist and console_unblank() is called from a non-NMI panic context, it will still attempt a down_trylock(). This could still result in a deadlock if one of the stopped CPUs is holding the semaphore internal spinlock. But this is a risk that the kernel has been (and continues to be) willing to take. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230717194607.145135-3-john.ogness@linutronix.de
7b23a66d · John Ogness · Petr Mladek · 6d3e0d8c · 7b23a66d
Commit 7b23a66d authored Jul 17, 2023 by John Ogness Committed by Petr Mladek Jul 20, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 28 additions and 0 deletions

kernel/printk/printk.c kernel/printk/printk.c +28 -0

No files found.
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3043,9 +3043,27 @@ EXPORT_SYMBOL(console_conditional_schedule);

 void console_unblank(void)
 {
+	bool found_unblank = false;
 	struct console *c;
 	int cookie;

+	/*
+	 * First check if there are any consoles implementing the unblank()
+	 * callback. If not, there is no reason to continue and take the
+	 * console lock, which in particular can be dangerous if
+	 * @oops_in_progress is set.
+	 */
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(c) {
+		if ((console_srcu_read_flags(c) & CON_ENABLED) && c->unblank) {
+			found_unblank = true;
+			break;
+		}
+	}
+	console_srcu_read_unlock(cookie);
+	if (!found_unblank)
+		return;
+
 	/*
 	 * Stop console printing because the unblank() callback may
 	 * assume the console is not within its write() callback.
@@ -3054,6 +3072,16 @@ void console_unblank(void)
 	 * In that case, attempt a trylock as best-effort.
 	 */
 	if (oops_in_progress) {
+		/* Semaphores are not NMI-safe. */
+		if (in_nmi())
+			return;
+
+		/*
+		 * Attempting to trylock the console lock can deadlock
+		 * if another CPU was stopped while modifying the
+		 * semaphore. "Hope and pray" that this is not the
+		 * current situation.
+		 */
 		if (down_trylock_console_sem() != 0)
 			return;
 	} else