1. 06 Oct, 2020 18 commits
  2. 18 Sep, 2020 18 commits
  3. 16 Sep, 2020 4 commits
    • Michael Ellerman's avatar
      Merge coregroup support into next · b5c8a293
      Michael Ellerman authored
      From Srikar's cover letter, with some reformatting:
      
      Cleanup of existing powerpc topologies and add coregroup support on
      powerpc. Coregroup is a group of (subset of) cores of a DIE that share
      a resource.
      
      Summary of some of the testing done with coregroup patchset.
      
      It includes ebizzy, schbench, perf bench sched pipe and topology
      verification. On the left side are results from powerpc/next tree and
      on the right are the results with the patchset applied. Topological
      verification clearly shows that there is no change in topology with
      and without the patches on all the 3 class of systems that were
      tested.
      
      Power 9 PowerNV (2 Node/ 160 Cpu System)
      ----------------------------------------
      
      Baseline                                                                Baseline + Coregroup Support
      
        N      Min       Max    Median       Avg        Stddev                  N      Min       Max    Median       Avg      Stddev
      100   993884   1276090   1173476   1165914     54867.201                100   910470   1279820   1171095   1162091    67363.28
      
      ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
      
      schbench (latency hence lower is better)
      Latency percentiles (usec)                                              Latency percentiles (usec)
              50.0th: 455                                                             50.0th: 454
              75.0th: 533                                                             75.0th: 543
              90.0th: 683                                                             90.0th: 701
              95.0th: 743                                                             95.0th: 737
              *99.0th: 815                                                            *99.0th: 805
              99.5th: 839                                                             99.5th: 835
              99.9th: 913                                                             99.9th: 893
              min=0, max=1011                                                         min=0, max=2833
      
      perf bench sched pipe (lesser time and higher ops/sec is better)
      Running 'sched/pipe' benchmark:                                         Running 'sched/pipe' benchmark:
      Executed 1000000 pipe operations between two processes                  Executed 1000000 pipe operations between two processes
      
           Total time: 6.083 [sec]                                                 Total time: 6.303 [sec]
      
             6.083576 usecs/op                                                       6.303318 usecs/op
               164377 ops/sec                                                          158646 ops/sec
      
      Power 9 LPAR (2 Node/ 128 Cpu System)
      -------------------------------------
      
      Baseline                                                                Baseline + Coregroup Support
      
        N       Min       Max    Median         Avg      Stddev                 N       Min       Max    Median         Avg      Stddev
      100   1058029   1295393   1200414   1188306.7   56786.538               100    943264   1287619   1180522   1168473.2   64469.955
      
      ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
      
      schbench (latency hence lower is better)
      Latency percentiles (usec)                                              Latency percentiles (usec)
              50.0000th: 34                                                           50.0000th: 39
              75.0000th: 46                                                           75.0000th: 52
              90.0000th: 53                                                           90.0000th: 68
              95.0000th: 56                                                           95.0000th: 77
              *99.0000th: 61                                                          *99.0000th: 89
              99.5000th: 63                                                           99.5000th: 94
              99.9000th: 81                                                           99.9000th: 169
              min=0, max=8405                                                         min=0, max=23674
      
      perf bench sched pipe (lesser time and higher ops/sec is better)
      Running 'sched/pipe' benchmark:                                         Running 'sched/pipe' benchmark:
      Executed 1000000 pipe operations between two processes                  Executed 1000000 pipe operations between two processes
      
           Total time: 8.768 [sec]                                                 Total time: 5.217 [sec]
      
             8.768400 usecs/op                                                       5.217625 usecs/op
               114045 ops/sec                                                          191658 ops/sec
      
      Power 8 LPAR (8 Node/ 256 Cpu System)
      -------------------------------------
      
      Baseline                                                                Baseline + Coregroup Support
      
        N       Min       Max    Median         Avg      Stddev                 N      Min      Max   Median        Avg     Stddev
      100   1267615   1965234   1707423   1689137.6   144363.29               100  1175357  1924262  1691104  1664792.1   145876.4
      
      ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
      
      schbench (latency hence lower is better)
      Latency percentiles (usec)                                              Latency percentiles (usec)
              50.0th: 37                                                              50.0th: 36
              75.0th: 51                                                              75.0th: 48
              90.0th: 59                                                              90.0th: 55
              95.0th: 63                                                              95.0th: 59
              *99.0th: 71                                                             *99.0th: 67
              99.5th: 75                                                              99.5th: 72
              99.9th: 105                                                             99.9th: 170
              min=0, max=18560                                                        min=0, max=27031
      
      perf bench sched pipe (lesser time and higher ops/sec is better)
      Running 'sched/pipe' benchmark:                                         Running 'sched/pipe' benchmark:
      Executed 1000000 pipe operations between two processes                  Executed 1000000 pipe operations between two processes
      
           Total time: 6.013 [sec]                                                 Total time: 5.930 [sec]
      
             6.013963 usecs/op                                                       5.930724 usecs/op
               166279 ops/sec                                                          168613 ops/sec
      
      Topology verification on Power9
      Power9 / powernv / SMT4
      
        $ tail /proc/cpuinfo
        cpu             : POWER9, altivec supported
        clock           : 3600.000000MHz
        revision        : 2.2 (pvr 004e 1202)
      
        timebase        : 512000000
        platform        : PowerNV
        model           : 9006-22P
        machine         : PowerNV 9006-22P
        firmware        : OPAL
        MMU             : Radix
      
      Baseline                                                                Baseline + Coregroup Support
      
        lscpu                                                                 lscpu
        ------                                                                ------
        Architecture:        ppc64le                                          Architecture:        ppc64le
        Byte Order:          Little Endian                                    Byte Order:          Little Endian
        CPU(s):              160                                              CPU(s):              160
        On-line CPU(s) list: 0-159                                            On-line CPU(s) list: 0-159
        Thread(s) per core:  4                                                Thread(s) per core:  4
        Core(s) per socket:  20                                               Core(s) per socket:  20
        Socket(s):           2                                                Socket(s):           2
        NUMA node(s):        2                                                NUMA node(s):        2
        Model:               2.2 (pvr 004e 1202)                              Model:               2.2 (pvr 004e 1202)
        Model name:          POWER9, altivec supported                        Model name:          POWER9, altivec supported
        CPU max MHz:         3800.0000                                        CPU max MHz:         3800.0000
        CPU min MHz:         2166.0000                                        CPU min MHz:         2166.0000
        L1d cache:           32K                                              L1d cache:           32K
        L1i cache:           32K                                              L1i cache:           32K
        L2 cache:            512K                                             L2 cache:            512K
        L3 cache:            10240K                                           L3 cache:            10240K
        NUMA node0 CPU(s):   0-79                                             NUMA node0 CPU(s):   0-79
        NUMA node8 CPU(s):   80-159                                           NUMA node8 CPU(s):   80-159
      
        grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name                grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
        -----------------------------------------------------                 -----------------------------------------------------
        /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT                   /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT
        /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE                 /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE
        /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE                   /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE
        /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA                  /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA
      
        grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags               grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
        ------------------------------------------------------                ------------------------------------------------------
        /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391                 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391
        /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327                 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327
        /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071                 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071
        /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801                /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801
      
      Baseline
      
        head /proc/schedstat
        --------------------
        version 15
        timestamp 4295043536
        cpu0 0 0 0 0 0 0 9597119314 2408913694 11897
        domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu1 0 0 0 0 0 0 4941435230 11106132 1583
        domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      Baseline + Coregroup Support
      
        head /proc/schedstat
        --------------------
        version 15
        timestamp 4296311826
        cpu0 0 0 0 0 0 0 3353674045024 3781680865826 297483
        domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu1 0 0 0 0 0 0 3337873293332 4231590033856 229090
        domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
        Post sudo ppc64_cpu --smt=1                                           Post sudo ppc64_cpu --smt=1
        ---------------------                                                 ---------------------
        grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name                grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
        -----------------------------------------------------                 -----------------------------------------------------
        /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE                 /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE
        /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE                   /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE
        /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA                  /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA
      
        grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags               grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
        ------------------------------------------------------                ------------------------------------------------------
        /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327                 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327
        /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071                 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071
        /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801                /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801
      
      Baseline:
      
        head /proc/schedstat
        --------------------
        version 15
        timestamp 4295046242
        cpu0 0 0 0 0 0 0 10978610020 2658997390 13068
        domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu4 0 0 0 0 0 0 5408663896 95701034 7697
        domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      Baseline + Coregroup Support
      
        head /proc/schedstat
        --------------------
        version 15
        timestamp 4296314905
        cpu0 0 0 0 0 0 0 3355392013536 3781975150576 298723
        domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu4 0 0 0 0 0 0 3351637920996 4427329763050 256776
        domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      Similar verification was done on Power 8 (8 Node 256 CPU LPAR) and
      Power 9 (2 node 128 Cpu LPAR) and they showed the topology before and
      after the patch to be identical. If Interested, I could provide the
      same.
      
      On Power 9 (with device-tree enablement to show coregroups):
      
        $ tail /proc/cpuinfo
        processor     : 127
        cpu           : POWER9 (architected), altivec supported
        clock         : 3000.000000MHz
        revision      : 2.2 (pvr 004e 0202)
      
        timebase      : 512000000
        platform      : pSeries
        model         : IBM,9008-22L
        machine       : CHRP IBM,9008-22L
        MMU           : Hash
      
      Before patchset:
      
        $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
        SMT
        CACHE
        DIE
        NUMA
      
        $ head /proc/schedstat
        version 15
        timestamp 4318242208
        cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
        domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain3 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu1 0 0 0 0 0 0 24177439200 413887604 75393
        domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      After patchset:
      
        $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
        SMT
        CACHE
        MC
        DIE
        NUMA
      
        $ head /proc/schedstat
        version 15
        timestamp 4318242208
        cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
        domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain2 00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain3 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        domain4 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        cpu1 0 0 0 0 0 0 24177439200 413887604 75393
        domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      b5c8a293
    • Srikar Dronamraju's avatar
      powerpc/smp: Implement cpu_to_coregroup_id · fa35e868
      Srikar Dronamraju authored
      Lookup the coregroup id from the associativity array.
      
      If unable to detect the coregroup id, fallback on the core id.
      This way, ensure sched_domain degenerates and an extra sched domain is
      not created.
      
      Ideally this function should have been implemented in
      arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
      don't need to find the primary domain again.
      
      If the device-tree mentions more than one coregroup, then kernel
      implements only the last or the smallest coregroup, which currently
      corresponds to the penultimate domain in the device-tree.
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200810071834.92514-11-srikar@linux.vnet.ibm.com
      fa35e868
    • Srikar Dronamraju's avatar
      powerpc/smp: Create coregroup domain · 72730bfc
      Srikar Dronamraju authored
      Add percpu coregroup maps and masks to create coregroup domain.
      If a coregroup doesn't exist, the coregroup domain will be degenerated
      in favour of SMT/CACHE domain. Do note this patch is only creating stubs
      for cpu_to_coregroup_id. The actual cpu_to_coregroup_id implementation
      would be in a subsequent patch.
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200810071834.92514-10-srikar@linux.vnet.ibm.com
      72730bfc
    • Srikar Dronamraju's avatar
      powerpc/smp: Allocate cpumask only after searching thread group · 6e086302
      Srikar Dronamraju authored
      If allocated earlier and the search fails, then cpu_l1_cache_map cpumask
      is unnecessarily cleared. However cpu_l1_cache_map can be allocated /
      cleared after we search thread group.
      
      Please note CONFIG_CPUMASK_OFFSTACK is not set on Powerpc. Hence cpumask
      allocated by zalloc_cpumask_var_node is never freed.
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200810071834.92514-9-srikar@linux.vnet.ibm.com
      6e086302