Commit f5bfdc8e authored by Waiman Long's avatar Waiman Long Committed by Peter Zijlstra

locking/osq: Use optimized spinning loop for arm64

Arm64 has a more optimized spinning loop (atomic_cond_read_acquire)
using wfe for spinlock that can boost performance of sibling threads
by putting the current cpu to a wait state that is broken only when
the monitored variable changes or an external event happens.

OSQ has a more complicated spinning loop. Besides the lock value, it
also checks for need_resched() and vcpu_is_preempted(). The check for
need_resched() is not a problem as it is only set by the tick interrupt
handler. That will be detected by the spinning cpu right after iret.

The vcpu_is_preempted() check, however, is a problem as changes to the
preempt state of of previous node will not affect the wait state. For
ARM64, vcpu_is_preempted is not currently defined and so is a no-op.
Will has indicated that he is planning to para-virtualize wfe instead
of defining vcpu_is_preempted for PV support. So just add a comment in
arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted()
should not be defined as suggested.

On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking
microbenchmark was run for 10s with and without the patch. The
performance numbers before patch were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 316/123,143/2,121,269
Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s

After patch, the numbers were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 334/147,836/1,304,787
Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s

So there was about 20% performance improvement.
Signed-off-by: default avatarWaiman Long <longman@redhat.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: default avatarWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200113150735.21956-1-longman@redhat.com
parent 57097124
...@@ -11,4 +11,13 @@ ...@@ -11,4 +11,13 @@
/* See include/linux/spinlock.h */ /* See include/linux/spinlock.h */
#define smp_mb__after_spinlock() smp_mb() #define smp_mb__after_spinlock() smp_mb()
/*
* Changing this will break osq_lock() thanks to the call inside
* smp_cond_load_relaxed().
*
* See:
* https://lore.kernel.org/lkml/20200110100612.GC2827@hirez.programming.kicks-ass.net
*/
#define vcpu_is_preempted(cpu) false
#endif /* __ASM_SPINLOCK_H */ #endif /* __ASM_SPINLOCK_H */
...@@ -134,20 +134,17 @@ bool osq_lock(struct optimistic_spin_queue *lock) ...@@ -134,20 +134,17 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* cmpxchg in an attempt to undo our queueing. * cmpxchg in an attempt to undo our queueing.
*/ */
while (!READ_ONCE(node->locked)) {
/* /*
* If we need to reschedule bail... so we can block. * Wait to acquire the lock or cancelation. Note that need_resched()
* Use vcpu_is_preempted() to avoid waiting for a preempted * will come with an IPI, which will wake smp_cond_load_relaxed() if it
* lock holder: * is implemented with a monitor-wait. vcpu_is_preempted() relies on
* polling, be careful.
*/ */
if (need_resched() || vcpu_is_preempted(node_cpu(node->prev))) if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
goto unqueue; vcpu_is_preempted(node_cpu(node->prev))))
cpu_relax();
}
return true; return true;
unqueue: /* unqueue */
/* /*
* Step - A -- stabilize @prev * Step - A -- stabilize @prev
* *
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment