Merge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux into drm-next

A patchset worked out together with Peter Zijlstra. Ingo is OK with taking it through the DRM tree: This is a small fallout from a work to allow batching WW mutex locks and unlocks. Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to "Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested here is to implement also the "Wound-Wait" algorithm as a per-WW-class choice, as it has advantages in some cases. See for example http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html Now Wound-Wait is a preemptive algorithm, and the preemption is implemented using a lazy scheme: If a wounded transaction is about to go to sleep on a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock prevention. Since with WW mutexes we also require the aborted transaction to sleep waiting to lock the WW mutex it was aborted on, this choice also provides a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first WW mutex lock after the transaction was wounded whether the WW mutex was contended or not, the transaction might frequently be restarted without a wait, which is far from optimal. Note also that with the lazy preemption scheme, contrary to Wait-Die there will be no rollbacks on lock contention of locks held by a transaction that has completed its locking sequence. The modeset locks are then changed from Wait-Die to Wound-Wait since the typical locking pattern of those locks very well matches the criterion for a substantial reduction in the number of rollbacks. For reservation objects, the benefit is more unclear at this point and they remain using Wait-Die. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180703105339.4461-1-thellstrom@vmware.com

Merge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux into drm-next
A patchset worked out together with Peter Zijlstra. Ingo is OK with taking it through the DRM tree: This is a small fallout from a work to allow batching WW mutex locks and unlocks. Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to "Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested here is to implement also the "Wound-Wait" algorithm as a per-WW-class choice, as it has advantages in some cases. See for example http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html Now Wound-Wait is a preemptive algorithm, and the preemption is implemented using a lazy scheme: If a wounded transaction is about to go to sleep on a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock prevention. Since with WW mutexes we also require the aborted transaction to sleep waiting to lock the WW mutex it was aborted on, this choice also provides a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first WW mutex lock after the transaction was wounded whether the WW mutex was contended or not, the transaction might frequently be restarted without a wait, which is far from optimal. Note also that with the lazy preemption scheme, contrary to Wait-Die there will be no rollbacks on lock contention of locks held by a transaction that has completed its locking sequence. The modeset locks are then changed from Wait-Die to Wound-Wait since the typical locking pattern of those locks very well matches the criterion for a substantial reduction in the number of rollbacks. For reservation objects, the benefit is more unclear at this point and they remain using Wait-Die. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180703105339.4461-1-thellstrom@vmware.com
c5be9b54 · Dave Airlie · 96b2bb0b · 07c13bb7 · c5be9b54 · c5be9b54
Commit c5be9b54 authored Jul 06, 2018 by Dave Airlie
7 changed files
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ======================================

 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,26 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.

-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+wounds the transaction holding the lock, requesting it to die.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm in that transactions are wounded by other transactions, and that
+requires a reliable way to pick up up the wounded condition and preempt the
+running transaction. Note that this is not the same as process preemption. A
+Wound-Wait transaction is considered preempted when it dies (returning
+-EDEADLK) following a wound.

 Concepts
 --------
@@ -47,18 +63,20 @@ Acquire context: To ensure eventual forward progress it is important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.

 W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.

 Furthermore there are three different class of w/w lock acquire functions:

 * Normal lock acquisition with a context, using ww_mutex_lock.

-* Slowpath lock acquisition on the contending lock, used by the wounded task
-  after having dropped all already acquired locks. These functions have the
-  _slow postfix.
+* Slowpath lock acquisition on the contending lock, used by the task that just
+  killed its transaction after having dropped all already acquired locks.
+  These functions have the _slow postfix.

  From a simple semantics point-of-view the _slow functions are not strictly
  required, since simply calling the normal ww_mutex_lock functions on the
@@ -90,6 +108,12 @@ provided.
 Usage
 -----

+The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
+DEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die)
+As a rough rule of thumb, use Wound-Wait iff you
+expect the number of simultaneous competing transactions to be typically small,
+and you want to reduce the number of rollbacks.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:

@@ -220,7 +244,7 @@ mutexes are a natural fit for such a case for two reasons:

 Note that this approach differs in two important ways from the above methods:
 - Since the list of objects is dynamically constructed (and might very well be
-  different when retrying due to hitting the -EDEADLK wound condition) there's
+  different when retrying due to hitting the -EDEADLK die condition) there's
  no need to keep any object on a persistent list when it's not locked. We can
  therefore move the list_head into the object itself.
 - On the other hand the dynamic object list construction also means that the -EALREADY return
@@ -312,12 +336,23 @@ Design:
  We maintain the following invariants for the wait list:
  (1) Waiters with an acquire context are sorted by stamp order; waiters
      without an acquire context are interspersed in FIFO order.
-  (2) Among waiters with contexts, only the first one can have other locks
-      acquired already (ctx->acquired > 0). Note that this waiter may come
-      after other waiters without contexts in the list.
+  (2) For Wait-Die, among waiters with contexts, only the first one can have
+      other locks acquired already (ctx->acquired > 0). Note that this waiter
+      may come after other waiters without contexts in the list.
+
+  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+  The wounded status of the transaction is checked only when there is
+  contention for a new lock and hence a true chance of deadlock. In that
+  situation, if the transaction is wounded, it backs off, clears the
+  wounded status and retries. A great benefit of implementing preemption in
+  this way is that the wounded transaction can identify a contending lock to
+  wait for before restarting the transaction. Just blindly restarting the
+  transaction would likely make the transaction end up in a situation where
+  it would have to back off again.

  In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices.
+  serialize access to resources for devices, and optimization focus should
+  therefore be directed towards the uncontended cases.

 Lockdep:
  Special care has been taken to warn for as many cases of api abuse

--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -46,7 +46,7 @@
 * write-side updates.
 */

-DEFINE_WW_CLASS(reservation_ww_class);
+DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);

 struct lock_class_key reservation_seqcount_class;

--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -6,8 +6,10 @@
 *
 *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
 *
- * Wound/wait implementation:
+ * Wait/Die implementation:
 *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
 *
 * This file contains the main data structure and API definitions.
 */
@@ -23,14 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	unsigned int is_wait_die;
 };

 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
-	unsigned acquired;
+	unsigned int acquired;
+	unsigned short wounded;
+	unsigned short is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
-	unsigned done_acquire;
+	unsigned int done_acquire;
 	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
@@ -38,8 +43,8 @@ struct ww_acquire_ctx {
 	struct lockdep_map dep_map;
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
-	unsigned deadlock_inject_interval;
-	unsigned deadlock_inject_countdown;
+	unsigned int deadlock_inject_interval;
+	unsigned int deadlock_inject_countdown;
 #endif
 };

@@ -58,17 +63,21 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif

-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }

 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }

+#define DEFINE_WD_CLASS(classname) \
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
+
 #define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)

 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -102,7 +111,7 @@ static inline void ww_mutex_init(struct ww_mutex *lock,
 *
 * Context-based w/w mutex acquiring can be done in any order whatsoever within
 * a given lock class. Deadlocks will be detected and handled with the
- * wait/wound logic.
+ * wait/die logic.
 *
 * Mixing of context-based w/w mutex acquiring and single w/w mutex locking can
 * result in undetected deadlocks and is so forbidden. Mixing different contexts
@@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
+	ctx->wounded = false;
+	ctx->is_wait_die = ww_class->is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
 	ctx->done_acquire = 0;
@@ -195,13 +206,13 @@ static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
 * Lock the w/w mutex exclusively for this task.
 *
 * Deadlocks within a given w/w class of locks are detected and handled with the
- * wait/wound algorithm. If the lock isn't immediately avaiable this function
+ * wait/die algorithm. If the lock isn't immediately available this function
 * will either sleep until it is (wait case). Or it selects the current context
- * for backing off by returning -EDEADLK (wound case). Trying to acquire the
+ * for backing off by returning -EDEADLK (die case). Trying to acquire the
 * same lock with the same context twice is also detected and signalled by
 * returning -EALREADY. Returns 0 if the mutex was successfully acquired.
 *
- * In the wound case the caller must release all currently held w/w mutexes for
+ * In the die case the caller must release all currently held w/w mutexes for
 * the given context and then wait for this contending lock to be available by
 * calling ww_mutex_lock_slow. Alternatively callers can opt to not acquire this
 * lock and proceed with trying to acquire further w/w mutexes (e.g. when
@@ -226,14 +237,14 @@ extern int /* __must_check */ ww_mutex_lock(struct ww_mutex *lock, struct ww_acq
 * Lock the w/w mutex exclusively for this task.
 *
 * Deadlocks within a given w/w class of locks are detected and handled with the
- * wait/wound algorithm. If the lock isn't immediately avaiable this function
+ * wait/die algorithm. If the lock isn't immediately available this function
 * will either sleep until it is (wait case). Or it selects the current context
- * for backing off by returning -EDEADLK (wound case). Trying to acquire the
+ * for backing off by returning -EDEADLK (die case). Trying to acquire the
 * same lock with the same context twice is also detected and signalled by
 * returning -EALREADY. Returns 0 if the mutex was successfully acquired. If a
 * signal arrives while waiting for the lock then this function returns -EINTR.
 *
- * In the wound case the caller must release all currently held w/w mutexes for
+ * In the die case the caller must release all currently held w/w mutexes for
 * the given context and then wait for this contending lock to be available by
 * calling ww_mutex_lock_slow_interruptible. Alternatively callers can opt to
 * not acquire this lock and proceed with trying to acquire further w/w mutexes
@@ -256,7 +267,7 @@ extern int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
 * @lock: the mutex to be acquired
 * @ctx: w/w acquire context
 *
- * Acquires a w/w mutex with the given context after a wound case. This function
+ * Acquires a w/w mutex with the given context after a die case. This function
 * will sleep until the lock becomes available.
 *
 * The caller must have released all w/w mutexes already acquired with the
@@ -290,7 +301,7 @@ ww_mutex_lock_slow(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 * @lock: the mutex to be acquired
 * @ctx: w/w acquire context
 *
- * Acquires a w/w mutex with the given context after a wound case. This function
+ * Acquires a w/w mutex with the given context after a die case. This function
 * will sleep until the lock becomes available and returns 0 when the lock has
 * been acquired. If a signal arrives while waiting for the lock then this
 * function returns -EINTR.

--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
 };

 #include <linux/ww_mutex.h>
-static DEFINE_WW_CLASS(torture_ww_class);
+static DEFINE_WD_CLASS(torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -26,7 +26,7 @@
 #include <linux/slab.h>
 #include <linux/ww_mutex.h>

-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WD_CLASS(ww_class);
 struct workqueue_struct *wq;

 struct test_mutex {

--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -29,7 +29,7 @@
 */
 static unsigned int debug_locks_verbose;

-static DEFINE_WW_CLASS(ww_lockdep);
+static DEFINE_WD_CLASS(ww_lockdep);

 static int __init setup_debug_locks_verbose(char *str)
 {