• Joel Fernandes's avatar
    sched/fair: Consider RT/IRQ pressure in capacity_spare_wake() · f453ae22
    Joel Fernandes authored
    capacity_spare_wake() in the slow path influences choice of idlest groups,
    as we search for groups with maximum spare capacity. In scenarios where
    RT pressure is high, a sub optimal group can be chosen and hurt
    performance of the task being woken up.
    
    Fix this by using capacity_of() instead of capacity_orig_of() in capacity_spare_wake().
    
    Tests results from improvements with this change are below. More tests
    were also done by myself and Matt Fleming to ensure no degradation in
    different benchmarks.
    
    1) Rohit ran barrier.c test (details below) with following improvements:
    ------------------------------------------------------------------------
    This was Rohit's original use case for a patch he posted at [1] however
    from his recent tests he showed my patch can replace his slow path
    changes [1] and there's no need to selectively scan/skip CPUs in
    find_idlest_group_cpu in the slow path to get the improvement he sees.
    
    barrier.c (open_mp code) as a micro-benchmark. It does a number of
    iterations and barrier sync at the end of each for loop.
    
    Here barrier,c is running in along with ping on CPU 0 and 1 as:
    'ping -l 10000 -q -s 10 -f hostX'
    
    barrier.c can be found at:
    http://www.spinics.net/lists/kernel/msg2506955.html
    
    Following are the results for the iterations per second with this
    micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
    Intel x86 machine:
    +--------+------------------+---------------------------+
    |Threads | Without patch    | With patch                |
    |        |                  |                           |
    +--------+--------+---------+-----------------+---------+
    |        | Mean   | Std Dev | Mean            | Std Dev |
    +--------+--------+---------+-----------------+---------+
    |1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
    |2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
    |4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
    |8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
    |16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
    |32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
    |64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
    +--------+--------+---------+-----------------+---------+
    
    2) ping+hackbench test on bare-metal sever (by Rohit)
    -----------------------------------------------------
    Here hackbench is running in threaded mode along
    with, running ping on CPU 0 and 1 as:
    'ping -l 10000 -q -s 10 -f hostX'
    
    This test is running on 2 socket, 20 core and 40 threads Intel x86
    machine:
    Number of loops is 10000 and runtime is in seconds (Lower is better).
    
    +--------------+-----------------+--------------------------+
    |Task Groups   | Without patch   |  With patch              |
    |              +-------+---------+----------------+---------+
    |(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
    +--------------+-------+---------+----------------+---------+
    |1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
    |2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
    |4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
    |8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
    |16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
    |25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
    +--------------+-------+---------+----------------+---------+
    
    [1] https://patchwork.kernel.org/patch/9991635/
    
    Matt Fleming also ran several different hackbench tests and cyclic test
    to santiy-check that the patch doesn't harm other usecases.
    Tested-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
    Tested-by: default avatarRohit Jain <rohit.k.jain@oracle.com>
    Signed-off-by: default avatarJoel Fernandes <joelaf@google.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    Cc: Atish Patra <atish.patra@oracle.com>
    Cc: Brendan Jackman <brendan.jackman@arm.com>
    Cc: Chris Redpath <Chris.Redpath@arm.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Juri Lelli <juri.lelli@arm.com>
    Cc: Len Brown <lenb@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Morten Ramussen <morten.rasmussen@arm.com>
    Cc: Patrick Bellasi <patrick.bellasi@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
    Cc: Saravana Kannan <skannan@quicinc.com>
    Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Cc: Steve Muckle <smuckle@google.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vikram Mulukutla <markivx@codeaurora.org>
    Cc: Viresh Kumar <viresh.kumar@linaro.org>
    Link: http://lkml.kernel.org/r/20171214212158.188190-1-joelaf@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    f453ae22
fair.c 260 KB