• Michael Ellerman's avatar
    powerpc/numa: Fix percpu allocations to be NUMA aware · ba4a648f
    Michael Ellerman authored
    In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
    switched to the generic implementation of cpu_to_node(), which uses a percpu
    variable to hold the NUMA node for each CPU.
    
    Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
    of our percpu areas, leading to a chicken and egg problem. In practice what
    happens is when we are setting up the percpu areas, cpu_to_node() reports that
    all CPUs are on node 0, so we allocate all percpu areas on node 0.
    
    This is visible in the dmesg output, as all pcpu allocs being in group 0:
    
      pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
      pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
      pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
      pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
      pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
      pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
    
    To fix it we need an early_cpu_to_node() which can run prior to percpu being
    setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
    in. With the patch dmesg output shows two groups, 0 and 1:
    
      pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
      pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
      pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
      pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
      pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
      pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
    
    We can also check the data_offset in the paca of various CPUs, with the fix we
    see:
    
      CPU 0:  data_offset = 0x0ffe8b0000
      CPU 24: data_offset = 0x1ffe5b0000
    
    And we can see from dmesg that CPU 24 has an allocation on node 1:
    
      node   0: [mem 0x0000000000000000-0x0000000fffffffff]
      node   1: [mem 0x0000001000000000-0x0000001fffffffff]
    
    Cc: stable@vger.kernel.org # v3.16+
    Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    ba4a648f
topology.h 2.45 KB