• Vineet Gupta's avatar
    ARCv2: spinlock: remove the extra smp_mb before lock, after unlock · 3032f0c9
    Vineet Gupta authored
     - ARCv2 LLSC spinlocks have smp_mb() both before and after the LLSC
       instructions, which is not required per lkmm ACQ/REL semantics.
       smp_mb() is only needed _after_ lock and _before_ unlock.
       So remove the extra barriers.
       The reason they were there was mainly historical. At the time of
       initial SMP Linux bringup on HS38 cores, I was too conservative,
       given the fluidity of both hw and sw. The last attempt to ditch the
       extra barrier showed some hackbench regression which is apparently
       not the case now (atleast for LLSC case, read on...)
    
     - EX based spinlocks (!CONFIG_ARC_HAS_LLSC) still needs the extra
       smp_mb(), not due to lkmm, but due to some hardware shenanigans.
       W/o that, hackbench triggers RCU stall splat so extra DMB is retained
       !LLSC based systems are not realistic Linux sstem anyways so they can
       afford to be a nit suboptimal ;-)
    
       | [ARCLinux]# for i in (seq 1 1 5) ; do hackbench; done
       | Running with 10 groups 400 process
       | INFO: task hackbench:158 blocked for more than 10 seconds.
       |       Not tainted 4.20.0-00005-g96b18288a88e-dirty #117
       | "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       | hackbench       D    0   158    135 0x00000000
       |
       | Stack Trace:
       | watchdog: BUG: soft lockup - CPU#3 stuck for 59s! [hackbench:469]
       | Modules linked in:
       | Path: (null)
       | CPU: 3 PID: 469 Comm: hackbench Not tainted 4.20.0-00005-g96b18288a88e-dirty
       |
       | [ECR   ]: 0x00000000 => Check Programmer's Manual
       | [EFA   ]: 0x00000000
       | [BLINK ]: do_exit+0x4a6/0x7d0
       | [ERET  ]: _raw_write_unlock_irq+0x44/0x5c
    
     - And while at it, remove the extar smp_mb() from EX based
       arch_read_trylock() since the spin lock there guarantees a full
       barrier anyways
    
     - For LLSC case, hackbench threads improves with this patch (HAPS @ 50MHz)
    
       ---- before ----
       |
       | [ARCLinux]# for i in 1 2 3 4 5; do hackbench 10 thread; done
       | Running with 10 groups 400 threads
       | Time: 16.253
       | Time: 16.445
       | Time: 16.590
       | Time: 16.721
       | Time: 16.544
    
       ---- after ----
       |
       | [ARCLinux]# for i in 1 2 3 4 5; do hackbench 10 thread; done
       | Running with 10 groups 400 threads
       | Time: 15.638
       | Time: 15.730
       | Time: 15.870
       | Time: 15.842
       | Time: 15.729
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
    3032f0c9
spinlock.h 8.63 KB