sched: Fix SCHED_MC regression caused by change in sched cpu_power
On platforms like dual socket quad-core platform, the scheduler load balancer is not detecting the load imbalances in certain scenarios. This is leading to scenarios like where one socket is completely busy (with all the 4 cores running with 4 tasks) and leaving another socket completely idle. This causes performance issues as those 4 tasks share the memory controller, last-level cache bandwidth etc. Also we won't be taking advantage of turbo-mode as much as we would like, etc. Some of the comparisons in the scheduler load balancing code are comparing the "weighted cpu load that is scaled wrt sched_group's cpu_power" with the "weighted average load per task that is not scaled wrt sched_group's cpu_power". While this has probably been broken for a longer time (for multi socket numa nodes etc), the problem got aggrevated via this recent change: | | commit f93e65c1 | Author: Peter Zijlstra <a.p.zijlstra@chello.nl> | Date: Tue Sep 1 10:34:32 2009 +0200 | | sched: Restore __cpu_power to a straight sum of power | Also with this change, the sched group cpu power alone no longer reflects the group capacity that is needed to implement MC, MT performance (default) and power-savings (user-selectable) policies. We need to use the computed group capacity (sgs.group_capacity, that is computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to find out if the group with the max load is above its capacity and how much load to move etc. Reported-by: Ma Ling <ling.ma@intel.com> Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> [ -v2: build fix ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x] LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Showing
Please register or sign in to comment