• Preeti U Murthy's avatar
    sched: Improve load balancing in the presence of idle CPUs · d4573c3e
    Preeti U Murthy authored
    When a CPU is kicked to do nohz idle balancing, it wakes up to do load
    balancing on itself, followed by load balancing on behalf of idle CPUs.
    But it may end up with load after the load balancing attempt on itself.
    This aborts nohz idle balancing. As a result several idle CPUs are left
    without tasks till such a time that an ILB CPU finds it unfavorable to
    pull tasks upon itself. This delays spreading of load across idle CPUs
    and worse, clutters only a few CPUs with tasks.
    
    The effect of the above problem was observed on an SMT8 POWER server
    with 2 levels of numa domains. Busy loops equal to number of cores were
    spawned. Since load balancing on fork/exec is discouraged across numa
    domains, all busy loops would start on one of the numa domains. However
    it was expected that eventually one busy loop would run per core across
    all domains due to nohz idle load balancing. But it was observed that it
    took as long as 10 seconds to spread the load across numa domains.
    
    Further investigation showed that this was a consequence of the
    following:
    
     1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
        load balancing [Given the experiment, upto 6 CPUs per core could be
        potentially idle in this domain.]
    
     2. However the ILB CPU would call load_balance() on itself before
        initiating nohz idle load balancing.
    
     3. Given cores are SMT8, the ILB CPU had enough opportunities to pull
        tasks from its sibling cores to even out load.
    
     4. Now that the ILB CPU was no longer idle, it would abort nohz idle
        load balancing
    
    As a result the opportunities to spread load across numa domains were
    lost until such a time that the cores within the first numa domain had
    equal number of tasks among themselves.  This is a pretty bad scenario,
    since the cores within the first numa domain would have as many as 4
    tasks each, while cores in the neighbouring numa domains would all
    remain idle.
    
    Fix this, by checking if a CPU was woken up to do nohz idle load
    balancing, before it does load balancing upon itself. This way we allow
    idle CPUs across the system to do load balancing which results in
    quicker spread of load, instead of performing load balancing within the
    local sched domain hierarchy of the ILB CPU alone under circumstances
    such as above.
    Signed-off-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarJason Low <jason.low2@hp.com>
    Cc: benh@kernel.crashing.org
    Cc: daniel.lezcano@linaro.org
    Cc: efault@gmx.de
    Cc: iamjoonsoo.kim@lge.com
    Cc: morten.rasmussen@arm.com
    Cc: pjt@google.com
    Cc: riel@redhat.com
    Cc: srikar@linux.vnet.ibm.com
    Cc: svaidy@linux.vnet.ibm.com
    Cc: tim.c.chen@linux.intel.com
    Cc: vincent.guittot@linaro.org
    Link: http://lkml.kernel.org/r/20150326130014.21532.17158.stgit@preeti.in.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    d4573c3e
fair.c 215 KB