Commit 405963b6 authored by Waiman Long's avatar Waiman Long Committed by Ingo Molnar

locking/qrwlock: Don't contend with readers when setting _QW_WAITING

The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

	Threads	   w/o patch	with patch	% change
	-------	   ---------	----------	--------
	   2	     990	    895		  -9.6%
	   3	    2136	   1912		 -10.5%
	   4	    3166	   2830		 -10.6%
	   5	    3953	   3629		  -8.2%
	   6	    4628	   4405		  -4.8%
	   7	    5344	   5197		  -2.8%
	   8	    6065	   6004		  -1.0%
	   9	    6826	   6811		  -0.2%
	  10	    7599	   7599		   0.0%
	  15	    9757	   9766		  +0.1%
	  20	   13767	  13817		  +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.
Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 92ae1837
...@@ -22,6 +22,26 @@ ...@@ -22,6 +22,26 @@
#include <linux/hardirq.h> #include <linux/hardirq.h>
#include <asm/qrwlock.h> #include <asm/qrwlock.h>
/*
* This internal data structure is used for optimizing access to some of
* the subfields within the atomic_t cnts.
*/
struct __qrwlock {
union {
atomic_t cnts;
struct {
#ifdef __LITTLE_ENDIAN
u8 wmode; /* Writer mode */
u8 rcnts[3]; /* Reader counts */
#else
u8 rcnts[3]; /* Reader counts */
u8 wmode; /* Writer mode */
#endif
};
};
arch_spinlock_t lock;
};
/** /**
* rspin_until_writer_unlock - inc reader count & spin until writer is gone * rspin_until_writer_unlock - inc reader count & spin until writer is gone
* @lock : Pointer to queue rwlock structure * @lock : Pointer to queue rwlock structure
...@@ -107,10 +127,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock) ...@@ -107,10 +127,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock)
* or wait for a previous writer to go away. * or wait for a previous writer to go away.
*/ */
for (;;) { for (;;) {
cnts = atomic_read(&lock->cnts); struct __qrwlock *l = (struct __qrwlock *)lock;
if (!(cnts & _QW_WMASK) &&
(atomic_cmpxchg(&lock->cnts, cnts, if (!READ_ONCE(l->wmode) &&
cnts | _QW_WAITING) == cnts)) (cmpxchg(&l->wmode, 0, _QW_WAITING) == 0))
break; break;
cpu_relax_lowlatency(); cpu_relax_lowlatency();
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment