Commit b0432d8f authored by Ken Chen's avatar Ken Chen Committed by Ingo Molnar

sched: Fix sched-domain avg_load calculation

In function find_busiest_group(), the sched-domain avg_load isn't
calculated at all if there is a group imbalance within the domain. This
will cause erroneous imbalance calculation.

The reason is that calculate_imbalance() sees sds->avg_load = 0 and it
will dump entire sds->max_load into imbalance variable, which is used
later on to migrate entire load from busiest CPU to the puller CPU.

This has two really bad effect:

1. stampede of task migration, and they won't be able to break out
   of the bad state because of positive feedback loop: large load
   delta -> heavier load migration -> larger imbalance and the cycle
   goes on.

2. severe imbalance in CPU queue depth.  This causes really long
   scheduling latency blip which affects badly on application that
   has tight latency requirement.

The fix is to have kernel calculate domain avg_load in both cases. This
will ensure that imbalance calculation is always sensible and the target
is usually half way between busiest and puller CPU.
Signed-off-by: default avatarKen Chen <kenchen@google.com>
Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
parent 4263a2f1
...@@ -3127,6 +3127,8 @@ find_busiest_group(struct sched_domain *sd, int this_cpu, ...@@ -3127,6 +3127,8 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
if (!sds.busiest || sds.busiest_nr_running == 0) if (!sds.busiest || sds.busiest_nr_running == 0)
goto out_balanced; goto out_balanced;
sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;
/* /*
* If the busiest group is imbalanced the below checks don't * If the busiest group is imbalanced the below checks don't
* work because they assumes all things are equal, which typically * work because they assumes all things are equal, which typically
...@@ -3151,7 +3153,6 @@ find_busiest_group(struct sched_domain *sd, int this_cpu, ...@@ -3151,7 +3153,6 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
* Don't pull any tasks if this group is already above the domain * Don't pull any tasks if this group is already above the domain
* average load. * average load.
*/ */
sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;
if (sds.this_load >= sds.avg_load) if (sds.this_load >= sds.avg_load)
goto out_balanced; goto out_balanced;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment