• Christoph Lameter's avatar
    cpuops: Use cmpxchg for xchg to avoid lock semantics · 8270137a
    Christoph Lameter authored
    Use cmpxchg instead of xchg to realize this_cpu_xchg.
    
    xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
    will not.
    
    Baselines:
    
    xchg()		= 18 cycles (no segment prefix, LOCK semantics)
    __this_cpu_xchg = 1 cycle
    
    (simulated using this_cpu_read/write, two prefixes. Looks like the
    cpu can use loop optimization to get rid of most of the overhead)
    
    Cycles before:
    
    this_cpu_xchg	 = 37 cycles (segment prefix and LOCK (implied by xchg))
    
    After:
    
    this_cpu_xchg	= 11 cycle (using cmpxchg without lock semantics)
    Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    8270137a
percpu.h 18.3 KB