• Gilad Ben-Yossef's avatar
    mm: only IPI CPUs to drain local pages if they exist · 74046494
    Gilad Ben-Yossef authored
    Calculate a cpumask of CPUs with per-cpu pages in any zone and only send
    an IPI requesting CPUs to drain these pages to the buddy allocator if they
    actually have pages when asked to flush.
    
    This patch saves 85%+ of IPIs asking to drain per-cpu pages in case of
    severe memory pressure that leads to OOM since in these cases multiple,
    possibly concurrent, allocation requests end up in the direct reclaim code
    path so when the per-cpu pages end up reclaimed on first allocation
    failure for most of the proceeding allocation attempts until the memory
    pressure is off (possibly via the OOM killer) there are no per-cpu pages
    on most CPUs (and there can easily be hundreds of them).
    
    This also has the side effect of shortening the average latency of direct
    reclaim by 1 or more order of magnitude since waiting for all the CPUs to
    ACK the IPI takes a long time.
    
    Tested by running "hackbench 400" on a 8 CPU x86 VM and observing the
    difference between the number of direct reclaim attempts that end up in
    drain_all_pages() and those were more then 1/2 of the online CPU had any
    per-cpu page in them, using the vmstat counters introduced in the next
    patch in the series and using proc/interrupts.
    
    In the test sceanrio, this was seen to save around 3600 global
    IPIs after trigerring an OOM on a concurrent workload:
    
    $ cat /proc/vmstat | tail -n 2
    pcp_global_drain 0
    pcp_global_ipi_saved 0
    
    $ cat /proc/interrupts | grep CAL
    CAL:          1          2          1          2
              2          2          2          2   Function call interrupts
    
    $ hackbench 400
    [OOM messages snipped]
    
    $ cat /proc/vmstat | tail -n 2
    pcp_global_drain 3647
    pcp_global_ipi_saved 3642
    
    $ cat /proc/interrupts | grep CAL
    CAL:          6         13          6          3
              3          3         1 2          7   Function call interrupts
    
    Please note that if the global drain is removed from the direct reclaim
    path as a patch from Mel Gorman currently suggests this should be replaced
    with an on_each_cpu_cond invocation.
    Signed-off-by: default avatarGilad Ben-Yossef <gilad@benyossef.com>
    Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Andi Kleen <andi@firstfloor.org>
    Acked-by: default avatarMichal Nazarewicz <mina86@mina86.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    74046494
page_alloc.c 157 KB