• Srikar Dronamraju's avatar
    powerpc/smp: Update cpu_core_map on all PowerPc systems · b8b92803
    Srikar Dronamraju authored
    lscpu() uses core_siblings to list the number of sockets in the
    system. core_siblings is set using topology_core_cpumask.
    
    While optimizing the powerpc bootup path, Commit 4ca234a9
    ("powerpc/smp: Stop updating cpu_core_mask").  it was found that
    updating cpu_core_mask() ended up taking a lot of time. It was thought
    that on Powerpc, cpu_core_mask() would always be same as
    cpu_cpu_mask() i.e number of sockets will always be equal to number of
    nodes. As an optimization, cpu_core_mask() was made a snapshot of
    cpu_cpu_mask().
    
    However that was found to be false with PowerPc KVM guests, where each
    node could have more than one socket. So with Commit c47f892d
    ("powerpc/smp: Reintroduce cpu_core_mask"), cpu_core_mask was updated
    based on chip_id but in an optimized way using some mask manipulations
    and chip_id caching.
    
    However on non-PowerNV and non-pseries KVM guests (i.e not
    implementing cpu_to_chip_id(), continued to use a copy of
    cpu_cpu_mask().
    
    There are two issues that were noticed on such systems
    1. lscpu would report one extra socket.
    On a IBM,9009-42A (aka zz system) which has only 2 chips/ sockets/
    nodes, lscpu would report
    Architecture:        ppc64le
    Byte Order:          Little Endian
    CPU(s):              160
    On-line CPU(s) list: 0-159
    Thread(s) per core:  8
    Core(s) per socket:  6
    Socket(s):           3                <--------------
    NUMA node(s):        2
    Model:               2.2 (pvr 004e 0202)
    Model name:          POWER9 (architected), altivec supported
    Hypervisor vendor:   pHyp
    Virtualization type: para
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            512K
    L3 cache:            10240K
    NUMA node0 CPU(s):   0-79
    NUMA node1 CPU(s):   80-159
    
    2. Currently cpu_cpu_mask is updated when a core is
    added/removed. However its not updated when smt mode switching or on
    CPUs are explicitly offlined. However all other percpu masks are
    updated to ensure only active/online CPUs are in the masks.
    This results in build_sched_domain traces since there will be CPUs in
    cpu_cpu_mask() but those CPUs are not present in SMT / CACHE / MC /
    NUMA domains. A loop of threads running smt mode switching and core
    add/remove will soon show this trace.
    Hence cpu_cpu_mask has to be update at smt mode switch.
    
    This will have impact on cpu_core_mask(). cpu_core_mask() is a
    snapshot of cpu_cpu_mask. Different CPUs within the same socket will
    end up having different cpu_core_masks since they are snapshots at
    different points of time. This means when lscpu will start reporting
    many more sockets than the actual number of sockets/ nodes / chips.
    
    Different ways to handle this problem:
    A. Update the snapshot aka cpu_core_mask for all CPUs whenever
       cpu_cpu_mask is updated. This would a non-optimal solution.
    B. Instead of a cpumask_var_t, make cpu_core_map a cpumask pointer
       pointing to cpu_cpu_mask. However percpu cpumask pointer is frowned
       upon and we need a clean way to handle PowerPc KVM guest which is
       not a snapshot.
    C. Update cpu_core_masks all PowerPc systems like in PowerPc KVM
    guests using mask manipulations. This approach is relatively simple
    and unifies with the existing code.
    D. On top of 3, we could also resurrect get_physical_package_id which
       could return a nid for the said CPU. However this is not needed at this
       time.
    
    Option C is the preferred approach for now.
    
    While this is somewhat a revert of Commit 4ca234a9 ("powerpc/smp:
    Stop updating cpu_core_mask").
    
    1. Plain revert has some conflicts
    2. For chip_id == -1, the cpu_core_mask is made identical to
    cpu_cpu_mask, unlike previously where cpu_core_mask was set to a core
    if chip_id doesn't exist.
    
    This goes by the principle that if chip_id is not exposed, then
    sockets / chip / node share the same set of CPUs.
    
    With the fix, lscpu o/p would be
    Architecture:        ppc64le
    Byte Order:          Little Endian
    CPU(s):              160
    On-line CPU(s) list: 0-159
    Thread(s) per core:  8
    Core(s) per socket:  6
    Socket(s):           2                     <--------------
    NUMA node(s):        2
    Model:               2.2 (pvr 004e 0202)
    Model name:          POWER9 (architected), altivec supported
    Hypervisor vendor:   pHyp
    Virtualization type: para
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            512K
    L3 cache:            10240K
    NUMA node0 CPU(s):   0-79
    NUMA node1 CPU(s):   80-159
    
    Fixes: 4ca234a9 ("powerpc/smp: Stop updating cpu_core_mask")
    Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210826100401.412519-3-srikar@linux.vnet.ibm.com
    b8b92803
smp.c 41.9 KB