[PATCH] normalise node load for NUMA
From: Andrew Theurer <habanero@us.ibm.com> This patch ensures that when node loads are compared, the load value is normalised. Without this, load balance across nodes of dissimilar cpu counts can cause unfairness and sometimes lower overall performance. For example, a 2 node system with 4 cpus in the first node and 2 cpus in the second. A workload with 6 running tasks would have 3 tasks running on one node and 3 on the other, leaving one cpu idle in the first node and two tasks sharing a cpu in the second node. The patch would ensure that 4 tasks run in the first node and 2 in the second. I ran some kernel compiles comparing this patch on a 2 node 4 cpu/2 cpu system to show the benefits. Without the patch I got 140 second elapsed time. With the patch I get 132 seconds (6% better). Although it is not very common to have nodes with dissimilar cpu counts, it is already happening. PPC64 systems with partitioning have this happen, and I expect it to be more common on ia32 as partitioning becomes more common.
Showing
Please register or sign in to comment