• Shaohua Li's avatar
    x86: Spread tlb flush vector between nodes · 93296720
    Shaohua Li authored
    Currently flush tlb vector allocation is based on below equation:
    	sender = smp_processor_id() % 8
    This isn't optimal, CPUs from different node can have the same vector, this
    causes a lot of lock contention. Instead, we can assign the same vectors to
    CPUs from the same node, while different node has different vectors. This has
    below advantages:
    a. if there is lock contention, the lock contention is between CPUs from one
    node. This should be much cheaper than the contention between nodes.
    b. completely avoid lock contention between nodes. This especially benefits
    kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
    to specific node.
    
    In my test, this could reduce > 20% CPU overhead in extreme case.The test
    machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
    to the first CPU of the node. I run a workload with 4 sequential mmap file
    read thread. The files are empty sparse file. This workload will trigger a
    lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
    extreme tlb flush lock contention because otherwise kswapd keeps migrating
    between CPUs of a node and I can't get stable result. Sure in real workload,
    we can't always see so big tlb flush lock contention, but it's possible.
    
    [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]
    Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
    LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
    93296720
tlb.c 8.84 KB