1. 11 Apr, 2011 2 commits
    • Ken Chen's avatar
      sched: Fix erroneous all_pinned logic · b30aef17
      Ken Chen authored
      The scheduler load balancer has specific code to deal with cases of
      unbalanced system due to lots of unmovable tasks (for example because of
      hard CPU affinity). In those situation, it excludes the busiest CPU that
      has pinned tasks for load balance consideration such that it can perform
      second 2nd load balance pass on the rest of the system.
      
      This all works as designed if there is only one cgroup in the system.
      
      However, when we have multiple cgroups, this logic has false positives and
      triggers multiple load balance passes despite there are actually no pinned
      tasks at all.
      
      The reason it has false positives is that the all pinned logic is deep in
      the lowest function of can_migrate_task() and is too low level:
      
      load_balance_fair() iterates each task group and calls balance_tasks() to
      migrate target load. Along the way, balance_tasks() will also set a
      all_pinned variable. Given that task-groups are iterated, this all_pinned
      variable is essentially the status of last group in the scanning process.
      Task group can have number of reasons that no load being migrated, none
      due to cpu affinity. However, this status bit is being propagated back up
      to the higher level load_balance(), which incorrectly think that no tasks
      were moved.  It kick off the all pinned logic and start multiple passes
      attempt to move load onto puller CPU.
      
      To fix this, move the all_pinned aggregation up at the iterator level.
      This ensures that the status is aggregated over all task-groups, not just
      last one in the list.
      Signed-off-by: default avatarKen Chen <kenchen@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/BANLkTi=ernzNawaR5tJZEsV_QVnfxqXmsQ@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b30aef17
    • Ken Chen's avatar
      sched: Fix sched-domain avg_load calculation · b0432d8f
      Ken Chen authored
      In function find_busiest_group(), the sched-domain avg_load isn't
      calculated at all if there is a group imbalance within the domain. This
      will cause erroneous imbalance calculation.
      
      The reason is that calculate_imbalance() sees sds->avg_load = 0 and it
      will dump entire sds->max_load into imbalance variable, which is used
      later on to migrate entire load from busiest CPU to the puller CPU.
      
      This has two really bad effect:
      
      1. stampede of task migration, and they won't be able to break out
         of the bad state because of positive feedback loop: large load
         delta -> heavier load migration -> larger imbalance and the cycle
         goes on.
      
      2. severe imbalance in CPU queue depth.  This causes really long
         scheduling latency blip which affects badly on application that
         has tight latency requirement.
      
      The fix is to have kernel calculate domain avg_load in both cases. This
      will ensure that imbalance calculation is always sensible and the target
      is usually half way between busiest and puller CPU.
      Signed-off-by: default avatarKen Chen <kenchen@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b0432d8f
  2. 10 Apr, 2011 1 commit
  3. 09 Apr, 2011 3 commits
  4. 08 Apr, 2011 4 commits
  5. 07 Apr, 2011 19 commits
  6. 06 Apr, 2011 11 commits