1. 02 Oct, 2018 9 commits
    • Dietmar Eggemann's avatar
      sched/fair: Disable LB_BIAS by default · fdf5f315
      Dietmar Eggemann authored
      LB_BIAS allows the adjustment on how conservative load should be
      balanced.
      
      The rq->cpu_load[idx] array is used for this functionality. It contains
      weighted CPU load decayed average values over different intervals
      (idx = 1..4). Idx = 0 is the weighted CPU load itself.
      
      The values are updated during scheduler_tick, before idle balance and at
      nohz exit.
      
      There are 5 different types of idx's per sched domain (sd). Each of them
      is used to index into the rq->cpu_load[idx] array in a specific scenario
      (busy, idle and newidle for load balancing, forkexec for wake-up
      slow-path load balancing and wake for affine wakeup based on weight).
      Only the sd idx's for busy and idle load balancing are set to 2,3 or 1,2
      respectively. All the other sd idx's are set to 0.
      
      Conservative load balancing is achieved for sd idx's >= 1 by using the
      min/max (source_load()/target_load()) value between the current weighted
      CPU load and the rq->cpu_load[sd idx -1] for the busiest(idlest)/local
      CPU load in load balancing or vice versa in the wake-up slow-path load
      balancing.
      There is no conservative balancing for sd idx = 0 since only current
      weighted CPU load is used in this case.
      
      It is very likely that LB_BIAS' influence on load balancing can be
      neglected (see test results below). This is further supported by:
      
      (1) Weighted CPU load today is by itself a decayed average value (PELT)
          (cfs_rq->avg->runnable_load_avg) and not the instantaneous load
          (rq->load.weight) it was when LB_BIAS was introduced.
      
      (2) Sd imbalance_pct is used for CPU_NEWLY_IDLE and CPU_NOT_IDLE (relate
          to sd's newidle and busy idx) in find_busiest_group() when comparing
          busiest and local avg load to make load balancing even more
          conservative.
      
      (3) The sd forkexec and newidle idx are always set to 0 so there is no
          adjustment on how conservatively load balancing is done here.
      
      (4) Affine wakeup based on weight (wake_affine_weight()) will not be
          impacted since the sd wake idx is always set to 0.
      
      Let's disable LB_BIAS by default for a few kernel releases to make sure
      that no workload and no scheduler topology is affected. The benefit of
      being able to remove the LB_BIAS dependency from source_load() and
      target_load() is that the entire rq->cpu_load[idx] code could be removed
      in this case.
      
      It is really hard to say if there is no regression w/o testing this with
      a lot of different workloads on a lot of different platforms, especially
      NUMA machines.
      The following 104 LKP (Linux Kernel Performance) tests were run by the
      0-Day guys mostly on multi-socket hosts with a larger number of logical
      cpus (88, 192).
      The base for the test was commit b3dae109 ("sched/swait: Rename to
      exclusive") (tip/sched/core v4.18-rc1).
      Only 2 out of the 104 tests had a significant change in one of the
      metrics (fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-NoSync-performance +7%
      files_per_sec, unixbench/300s-100%-syscall-performance -11% score).
      Tests which showed a change in one of the metrics are marked with a '*'
      and this change is listed as well.
      
      (a) lkp-bdw-ep3:
            88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 64G
      
          dd-write/10m-1HDD-cfq-btrfs-100dd-performance
          fsmark/1x-1t-1HDD-xfs-nfsv4-4M-60G-NoSync-performance
        * fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-NoSync-performance
            7.50  7%  8.00  ±  6%  fsmark.files_per_sec
          fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-fsyncBeforeClose-performance
          fsmark/1x-1t-1HDD-btrfs-4M-60G-NoSync-performance
          fsmark/1x-1t-1HDD-btrfs-4M-60G-fsyncBeforeClose-performance
          kbuild/300s-50%-vmlinux_prereq-performance
          kbuild/300s-200%-vmlinux_prereq-performance
          kbuild/300s-50%-vmlinux_prereq-performance-1HDD-ext4
          kbuild/300s-200%-vmlinux_prereq-performance-1HDD-ext4
      
      (b) lkp-skl-4sp1:
            192 threads Intel(R) Xeon(R) Platinum 8160 768G
      
          dbench/100%-performance
          ebizzy/200%-100x-10s-performance
          hackbench/1600%-process-pipe-performance
          iperf/300s-cs-localhost-tcp-performance
          iperf/300s-cs-localhost-udp-performance
          perf-bench-numa-mem/2t-300M-performance
          perf-bench-sched-pipe/10000000ops-process-performance
          perf-bench-sched-pipe/10000000ops-threads-performance
          schbench/2-16-300-30000-30000-performance
          tbench/100%-cs-localhost-performance
      
      (c) lkp-bdw-ep6:
            88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 128G
      
          stress-ng/100%-60s-pipe-performance
          unixbench/300s-1-whetstone-double-performance
          unixbench/300s-1-shell1-performance
          unixbench/300s-1-shell8-performance
          unixbench/300s-1-pipe-performance
        * unixbench/300s-1-context1-performance
            312  315  unixbench.score
          unixbench/300s-1-spawn-performance
          unixbench/300s-1-syscall-performance
          unixbench/300s-1-dhry2reg-performance
          unixbench/300s-1-fstime-performance
          unixbench/300s-1-fsbuffer-performance
          unixbench/300s-1-fsdisk-performance
          unixbench/300s-100%-whetstone-double-performance
          unixbench/300s-100%-shell1-performance
          unixbench/300s-100%-shell8-performance
          unixbench/300s-100%-pipe-performance
          unixbench/300s-100%-context1-performance
          unixbench/300s-100%-spawn-performance
        * unixbench/300s-100%-syscall-performance
            3571  ±  3%  -11%  3183  ±  4%  unixbench.score
          unixbench/300s-100%-dhry2reg-performance
          unixbench/300s-100%-fstime-performance
          unixbench/300s-100%-fsbuffer-performance
          unixbench/300s-100%-fsdisk-performance
          unixbench/300s-1-execl-performance
          unixbench/300s-100%-execl-performance
        * will-it-scale/brk1-performance
            365004  360387  will-it-scale.per_thread_ops
        * will-it-scale/dup1-performance
            432401  437596  will-it-scale.per_thread_ops
          will-it-scale/eventfd1-performance
          will-it-scale/futex1-performance
          will-it-scale/futex2-performance
          will-it-scale/futex3-performance
          will-it-scale/futex4-performance
          will-it-scale/getppid1-performance
          will-it-scale/lock1-performance
          will-it-scale/lseek1-performance
          will-it-scale/lseek2-performance
        * will-it-scale/malloc1-performance
            47025  45817  will-it-scale.per_thread_ops
            77499  76529  will-it-scale.per_process_ops
          will-it-scale/malloc2-performance
        * will-it-scale/mmap1-performance
            123399  120815  will-it-scale.per_thread_ops
            152219  149833  will-it-scale.per_process_ops
        * will-it-scale/mmap2-performance
            107327  104714  will-it-scale.per_thread_ops
            136405  133765  will-it-scale.per_process_ops
          will-it-scale/open1-performance
        * will-it-scale/open2-performance
            171570  168805  will-it-scale.per_thread_ops
            532644  526202  will-it-scale.per_process_ops
          will-it-scale/page_fault1-performance
          will-it-scale/page_fault2-performance
          will-it-scale/page_fault3-performance
          will-it-scale/pipe1-performance
          will-it-scale/poll1-performance
        * will-it-scale/poll2-performance
            176134  172848  will-it-scale.per_thread_ops
            281361  275053  will-it-scale.per_process_ops
          will-it-scale/posix_semaphore1-performance
          will-it-scale/pread1-performance
          will-it-scale/pread2-performance
          will-it-scale/pread3-performance
          will-it-scale/pthread_mutex1-performance
          will-it-scale/pthread_mutex2-performance
          will-it-scale/pwrite1-performance
          will-it-scale/pwrite2-performance
          will-it-scale/pwrite3-performance
        * will-it-scale/read1-performance
            1190563  1174833  will-it-scale.per_thread_ops
        * will-it-scale/read2-performance
            1105369  1080427  will-it-scale.per_thread_ops
          will-it-scale/readseek1-performance
        * will-it-scale/readseek2-performance
            261818  259040  will-it-scale.per_thread_ops
          will-it-scale/readseek3-performance
        * will-it-scale/sched_yield-performance
            2408059  2382034  will-it-scale.per_thread_ops
          will-it-scale/signal1-performance
          will-it-scale/unix1-performance
          will-it-scale/unlink1-performance
          will-it-scale/unlink2-performance
        * will-it-scale/write1-performance
            976701  961588  will-it-scale.per_thread_ops
        * will-it-scale/writeseek1-performance
            831898  822448  will-it-scale.per_thread_ops
        * will-it-scale/writeseek2-performance
            228248  225065  will-it-scale.per_thread_ops
        * will-it-scale/writeseek3-performance
            226670  224058  will-it-scale.per_thread_ops
          will-it-scale/context_switch1-performance
          aim7/performance-fork_test-2000
        * aim7/performance-brk_test-3000
            74869  76676  aim7.jobs-per-min
          aim7/performance-disk_cp-3000
          aim7/performance-disk_rd-3000
          aim7/performance-sieve-3000
          aim7/performance-page_test-3000
          aim7/performance-creat-clo-3000
          aim7/performance-mem_rtns_1-8000
          aim7/performance-disk_wrt-8000
          aim7/performance-pipe_cpy-8000
          aim7/performance-ram_copy-8000
      
      (d) lkp-avoton3:
            8 threads Intel(R) Atom(TM) CPU C2750 @ 2.40GHz 16G
      
          netperf/ipv4-900s-200%-cs-localhost-TCP_STREAM-performance
      Signed-off-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Li Zhijian <zhijianx.li@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180809135753.21077-1-dietmar.eggemann@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fdf5f315
    • Vincent Guittot's avatar
      sched/pelt: Fix warning and clean up IRQ PELT config · 11d4afd4
      Vincent Guittot authored
      Create a config for enabling irq load tracking in the scheduler.
      irq load tracking is useful only when irq or paravirtual time is
      accounted but it's only possible with SMP for now.
      
      Also use __maybe_unused to remove the compilation warning in
      update_rq_clock_task() that has been introduced by:
      
        2e62c474 ("sched/fair: Remove #ifdefs from scale_rt_capacity()")
      Suggested-by: default avatarIngo Molnar <mingo@redhat.com>
      Reported-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Reported-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: dou_liyang@163.com
      Fixes: 2e62c474 ("sched/fair: Remove #ifdefs from scale_rt_capacity()")
      Link: http://lkml.kernel.org/r/1537867062-27285-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      11d4afd4
    • Ingo Molnar's avatar
    • Srikar Dronamraju's avatar
      sched/numa: Avoid task migration for small NUMA improvement · 6fd98e77
      Srikar Dronamraju authored
      If NUMA improvement from the task migration is going to be very
      minimal, then avoid task migration.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     198512  205910   3.72673
      1     313559  318491   1.57291
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev     Current  %Change
      8     74761.9  74935.9  0.232739
      1     214874   226796   5.54837
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     180536  189780   5.12031
      1     210281  205695   -2.18089
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     56511.4  60370    6.828
      1     104899   108100   3.05151
      
      1/7 cases is regressing, if we look at events migrate_pages seem
      to vary the most especially in the regressing case. Also some
      amount of variance is expected between different runs of
      Specjbb2005.
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        13,818,546      13,801,554
      migrations                1,149,960       1,151,541
      faults                    385,583         433,246
      cache-misses              55,259,546,768  55,168,691,835
      sched:sched_move_numa     2,257           2,551
      sched:sched_stick_numa    9               24
      sched:sched_swap_numa     512             904
      migrate:mm_migrate_pages  2,225           1,571
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        72692   113682
      numa_hint_faults_local  62270   102163
      numa_hit                238762  240181
      numa_huge_pte_updates   48      36
      numa_interleave         75      64
      numa_local              238676  240103
      numa_other              86      78
      numa_pages_migrated     2225    1564
      numa_pte_updates        98557   134080
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,173,490       3,079,150
      migrations                36,966          31,455
      faults                    108,776         99,081
      cache-misses              12,200,075,320  11,588,126,740
      sched:sched_move_numa     1,264           1
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     0               0
      migrate:mm_migrate_pages  899             36
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        21109   430
      numa_hint_faults_local  17120   77
      numa_hit                72934   71277
      numa_huge_pte_updates   42      0
      numa_interleave         33      22
      numa_local              72866   71218
      numa_other              68      59
      numa_pages_migrated     915     23
      numa_pte_updates        42326   0
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,312,022    8,707,565
      migrations                231,705      171,342
      faults                    310,242      310,820
      cache-misses              402,324,573  136,115,400
      sched:sched_move_numa     193          215
      sched:sched_stick_numa    0            6
      sched:sched_swap_numa     3            24
      migrate:mm_migrate_pages  93           162
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        11838   8985
      numa_hint_faults_local  11216   8154
      numa_hit                90689   93819
      numa_huge_pte_updates   0       0
      numa_interleave         1579    882
      numa_local              89634   93496
      numa_other              1055    323
      numa_pages_migrated     92      169
      numa_pte_updates        12109   9217
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before      After
      cs                        2,170,481   2,152,072
      migrations                10,126      10,704
      faults                    160,962     164,376
      cache-misses              10,834,845  3,818,437
      sched:sched_move_numa     10          16
      sched:sched_stick_numa    0           0
      sched:sched_swap_numa     0           7
      migrate:mm_migrate_pages  2           199
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        403     2248
      numa_hint_faults_local  358     1666
      numa_hit                25898   25704
      numa_huge_pte_updates   0       0
      numa_interleave         207     200
      numa_local              25860   25679
      numa_other              38      25
      numa_pages_migrated     2       197
      numa_pte_updates        400     2234
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        110,339,633      93,330,595
      migrations                4,139,812        4,122,061
      faults                    863,622          865,979
      cache-misses              231,838,045,660  225,395,083,479
      sched:sched_move_numa     2,196            2,372
      sched:sched_stick_numa    33               24
      sched:sched_swap_numa     544              769
      migrate:mm_migrate_pages  2,469            1,677
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        85748   91638
      numa_hint_faults_local  66831   78096
      numa_hit                242213  242225
      numa_huge_pte_updates   0       0
      numa_interleave         0       2
      numa_local              242211  242219
      numa_other              2       6
      numa_pages_migrated     2376    1515
      numa_pte_updates        86233   92274
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        59,331,057      51,487,271
      migrations                552,019         537,170
      faults                    266,586         256,921
      cache-misses              73,796,312,990  70,073,831,187
      sched:sched_move_numa     981             576
      sched:sched_stick_numa    54              24
      sched:sched_swap_numa     286             327
      migrate:mm_migrate_pages  713             726
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        14807   12000
      numa_hint_faults_local  5738    5024
      numa_hit                36230   36470
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              36228   36465
      numa_other              2       5
      numa_pages_migrated     703     726
      numa_pte_updates        14742   11930
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-7-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6fd98e77
    • Srikar Dronamraju's avatar
      mm/migrate: Use spin_trylock() while resetting rate limit · 75346121
      Srikar Dronamraju authored
      Since this spinlock will only serialize the migrate rate limiting,
      convert the spin_lock() to a spin_trylock(). If another thread is updating, this
      task can move on.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     205332  198512   -3.32145
      1     319785  313559   -1.94693
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      8     74912   74761.9  -0.200368
      1     206585  214874   4.01239
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     189162  180536   -4.56011
      1     213760  210281   -1.62753
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     58736.8  56511.4  -3.78877
      1     105419   104899   -0.49327
      
      Avoiding stretching of window intervals may be the reason for the
      regression. Also code now uses READ_ONCE/WRITE_ONCE. That may
      also be hurting performance to some extent.
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        14,285,708      13,818,546
      migrations                1,180,621       1,149,960
      faults                    339,114         385,583
      cache-misses              55,205,631,894  55,259,546,768
      sched:sched_move_numa     843             2,257
      sched:sched_stick_numa    6               9
      sched:sched_swap_numa     219             512
      migrate:mm_migrate_pages  365             2,225
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        26907   72692
      numa_hint_faults_local  24279   62270
      numa_hit                239771  238762
      numa_huge_pte_updates   0       48
      numa_interleave         68      75
      numa_local              239688  238676
      numa_other              83      86
      numa_pages_migrated     363     2225
      numa_pte_updates        27415   98557
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,202,779       3,173,490
      migrations                37,186          36,966
      faults                    106,076         108,776
      cache-misses              12,024,873,744  12,200,075,320
      sched:sched_move_numa     931             1,264
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     1               0
      migrate:mm_migrate_pages  637             899
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        17409   21109
      numa_hint_faults_local  14367   17120
      numa_hit                73953   72934
      numa_huge_pte_updates   20      42
      numa_interleave         25      33
      numa_local              73892   72866
      numa_other              61      68
      numa_pages_migrated     668     915
      numa_pte_updates        27276   42326
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,474,013    8,312,022
      migrations                254,934      231,705
      faults                    320,506      310,242
      cache-misses              110,580,458  402,324,573
      sched:sched_move_numa     725          193
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     7            3
      migrate:mm_migrate_pages  145          93
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        22797   11838
      numa_hint_faults_local  21539   11216
      numa_hit                89308   90689
      numa_huge_pte_updates   0       0
      numa_interleave         865     1579
      numa_local              88955   89634
      numa_other              353     1055
      numa_pages_migrated     149     92
      numa_pte_updates        22930   12109
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before     After
      cs                        2,195,628  2,170,481
      migrations                11,179     10,126
      faults                    149,656    160,962
      cache-misses              8,117,515  10,834,845
      sched:sched_move_numa     49         10
      sched:sched_stick_numa    0          0
      sched:sched_swap_numa     0          0
      migrate:mm_migrate_pages  5          2
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        3577    403
      numa_hint_faults_local  3476    358
      numa_hit                26142   25898
      numa_huge_pte_updates   0       0
      numa_interleave         358     207
      numa_local              26042   25860
      numa_other              100     38
      numa_pages_migrated     5       2
      numa_pte_updates        3587    400
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        100,602,296      110,339,633
      migrations                4,135,630        4,139,812
      faults                    789,256          863,622
      cache-misses              226,160,621,058  231,838,045,660
      sched:sched_move_numa     1,366            2,196
      sched:sched_stick_numa    16               33
      sched:sched_swap_numa     374              544
      migrate:mm_migrate_pages  1,350            2,469
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        47857   85748
      numa_hint_faults_local  39768   66831
      numa_hit                240165  242213
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              240165  242211
      numa_other              0       2
      numa_pages_migrated     1224    2376
      numa_pte_updates        48354   86233
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        58,515,496      59,331,057
      migrations                564,845         552,019
      faults                    245,807         266,586
      cache-misses              73,603,757,976  73,796,312,990
      sched:sched_move_numa     996             981
      sched:sched_stick_numa    10              54
      sched:sched_swap_numa     193             286
      migrate:mm_migrate_pages  646             713
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        13422   14807
      numa_hint_faults_local  5619    5738
      numa_hit                36118   36230
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              36116   36228
      numa_other              2       2
      numa_pages_migrated     616     703
      numa_pte_updates        13374   14742
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-6-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      75346121
    • Mel Gorman's avatar
      sched/numa: Limit the conditions where scan period is reset · 05cbdf4f
      Mel Gorman authored
      migrate_task_rq_fair() resets the scan rate for NUMA balancing on every
      cross-node migration. In the event of excessive load balancing due to
      saturation, this may result in the scan rate being pegged at maximum and
      further overloading the machine.
      
      This patch only resets the scan if NUMA balancing is active, a preferred
      node has been selected and the task is being migrated from the preferred
      node as these are the most harmful. For example, a migration to the preferred
      node does not justify a faster scan rate. Similarly, a migration between two
      nodes that are not preferred is probably bouncing due to over-saturation of
      the machine.  In that case, scanning faster and trapping more NUMA faults
      will further overload the machine.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     203370  205332   0.964744
      1     328431  319785   -2.63252
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      1     206070  206585   0.249915
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     188386  189162   0.41192
      1     201566  213760   6.04963
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     59157.4  58736.8  -0.710985
      1     105495   105419   -0.0720413
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        13,825,492      14,285,708
      migrations                1,152,509       1,180,621
      faults                    371,948         339,114
      cache-misses              55,654,206,041  55,205,631,894
      sched:sched_move_numa     1,856           843
      sched:sched_stick_numa    4               6
      sched:sched_swap_numa     428             219
      migrate:mm_migrate_pages  898             365
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        57146   26907
      numa_hint_faults_local  51612   24279
      numa_hit                238164  239771
      numa_huge_pte_updates   16      0
      numa_interleave         63      68
      numa_local              238085  239688
      numa_other              79      83
      numa_pages_migrated     883     363
      numa_pte_updates        67540   27415
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,288,525       3,202,779
      migrations                38,652          37,186
      faults                    111,678         106,076
      cache-misses              12,111,197,376  12,024,873,744
      sched:sched_move_numa     900             931
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     5               1
      migrate:mm_migrate_pages  714             637
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        18572   17409
      numa_hint_faults_local  14850   14367
      numa_hit                73197   73953
      numa_huge_pte_updates   11      20
      numa_interleave         25      25
      numa_local              73138   73892
      numa_other              59      61
      numa_pages_migrated     712     668
      numa_pte_updates        24021   27276
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,451,543    8,474,013
      migrations                202,804      254,934
      faults                    310,024      320,506
      cache-misses              253,522,507  110,580,458
      sched:sched_move_numa     213          725
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     2            7
      migrate:mm_migrate_pages  88           145
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        11830   22797
      numa_hint_faults_local  11301   21539
      numa_hit                90038   89308
      numa_huge_pte_updates   0       0
      numa_interleave         855     865
      numa_local              89796   88955
      numa_other              242     353
      numa_pages_migrated     88      149
      numa_pte_updates        12039   22930
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before     After
      cs                        2,049,153  2,195,628
      migrations                11,405     11,179
      faults                    162,309    149,656
      cache-misses              7,203,343  8,117,515
      sched:sched_move_numa     22         49
      sched:sched_stick_numa    0          0
      sched:sched_swap_numa     0          0
      migrate:mm_migrate_pages  1          5
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        1693    3577
      numa_hint_faults_local  1669    3476
      numa_hit                25177   26142
      numa_huge_pte_updates   0       0
      numa_interleave         194     358
      numa_local              24993   26042
      numa_other              184     100
      numa_pages_migrated     1       5
      numa_pte_updates        1577    3587
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        94,515,937       100,602,296
      migrations                4,203,554        4,135,630
      faults                    832,697          789,256
      cache-misses              226,248,698,331  226,160,621,058
      sched:sched_move_numa     1,730            1,366
      sched:sched_stick_numa    14               16
      sched:sched_swap_numa     432              374
      migrate:mm_migrate_pages  1,398            1,350
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        80079   47857
      numa_hint_faults_local  68620   39768
      numa_hit                241187  240165
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              241186  240165
      numa_other              1       0
      numa_pages_migrated     1347    1224
      numa_pte_updates        80729   48354
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        63,704,961      58,515,496
      migrations                573,404         564,845
      faults                    230,878         245,807
      cache-misses              76,568,222,781  73,603,757,976
      sched:sched_move_numa     509             996
      sched:sched_stick_numa    31              10
      sched:sched_swap_numa     182             193
      migrate:mm_migrate_pages  541             646
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        8501    13422
      numa_hint_faults_local  2960    5619
      numa_hit                35526   36118
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              35526   36116
      numa_other              0       2
      numa_pages_migrated     539     616
      numa_pte_updates        8433    13374
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-5-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      05cbdf4f
    • Srikar Dronamraju's avatar
      sched/numa: Reset scan rate whenever task moves across nodes · 3f9672ba
      Srikar Dronamraju authored
      Currently task scan rate is reset when NUMA balancer migrates the task
      to a different node. If NUMA balancer initiates a swap, reset is only
      applicable to the task that initiates the swap. Similarly no scan rate
      reset is done if the task is migrated across nodes by traditional load
      balancer.
      
      Instead move the scan reset to the migrate_task_rq. This ensures the
      task moved out of its preferred node, either gets back to its preferred
      node quickly or finds a new preferred node. Doing so, would be fair to
      all tasks migrating across nodes.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     200668  203370   1.3465
      1     321791  328431   2.06345
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      1     204848  206070   0.59654
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     188098  188386   0.153112
      1     200351  201566   0.606436
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     58145.9  59157.4  1.73959
      1     103798   105495   1.63491
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        13,912,183      13,825,492
      migrations                1,155,931       1,152,509
      faults                    367,139         371,948
      cache-misses              54,240,196,814  55,654,206,041
      sched:sched_move_numa     1,571           1,856
      sched:sched_stick_numa    9               4
      sched:sched_swap_numa     463             428
      migrate:mm_migrate_pages  703             898
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        50155   57146
      numa_hint_faults_local  45264   51612
      numa_hit                239652  238164
      numa_huge_pte_updates   36      16
      numa_interleave         68      63
      numa_local              239576  238085
      numa_other              76      79
      numa_pages_migrated     680     883
      numa_pte_updates        71146   67540
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,156,720       3,288,525
      migrations                30,354          38,652
      faults                    97,261          111,678
      cache-misses              12,400,026,826  12,111,197,376
      sched:sched_move_numa     4               900
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     1               5
      migrate:mm_migrate_pages  20              714
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        272     18572
      numa_hint_faults_local  186     14850
      numa_hit                71362   73197
      numa_huge_pte_updates   0       11
      numa_interleave         23      25
      numa_local              71299   73138
      numa_other              63      59
      numa_pages_migrated     2       712
      numa_pte_updates        0       24021
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,606,824    8,451,543
      migrations                155,352      202,804
      faults                    301,409      310,024
      cache-misses              157,759,224  253,522,507
      sched:sched_move_numa     168          213
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     3            2
      migrate:mm_migrate_pages  125          88
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        4650    11830
      numa_hint_faults_local  3946    11301
      numa_hit                90489   90038
      numa_huge_pte_updates   0       0
      numa_interleave         892     855
      numa_local              90034   89796
      numa_other              455     242
      numa_pages_migrated     124     88
      numa_pte_updates        4818    12039
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before     After
      cs                        2,113,167  2,049,153
      migrations                10,533     11,405
      faults                    142,727    162,309
      cache-misses              5,594,192  7,203,343
      sched:sched_move_numa     10         22
      sched:sched_stick_numa    0          0
      sched:sched_swap_numa     0          0
      migrate:mm_migrate_pages  6          1
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        744     1693
      numa_hint_faults_local  584     1669
      numa_hit                25551   25177
      numa_huge_pte_updates   0       0
      numa_interleave         263     194
      numa_local              25302   24993
      numa_other              249     184
      numa_pages_migrated     6       1
      numa_pte_updates        744     1577
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        101,227,352      94,515,937
      migrations                4,151,829        4,203,554
      faults                    745,233          832,697
      cache-misses              224,669,561,766  226,248,698,331
      sched:sched_move_numa     617              1,730
      sched:sched_stick_numa    2                14
      sched:sched_swap_numa     187              432
      migrate:mm_migrate_pages  316              1,398
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        24195   80079
      numa_hint_faults_local  21639   68620
      numa_hit                238331  241187
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              238331  241186
      numa_other              0       1
      numa_pages_migrated     204     1347
      numa_pte_updates        24561   80729
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        62,738,978      63,704,961
      migrations                562,702         573,404
      faults                    228,465         230,878
      cache-misses              75,778,067,952  76,568,222,781
      sched:sched_move_numa     648             509
      sched:sched_stick_numa    13              31
      sched:sched_swap_numa     137             182
      migrate:mm_migrate_pages  733             541
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        10281   8501
      numa_hint_faults_local  3242    2960
      numa_hit                36338   35526
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              36338   35526
      numa_other              0       0
      numa_pages_migrated     706     539
      numa_pte_updates        10176   8433
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-4-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3f9672ba
    • Srikar Dronamraju's avatar
      sched/numa: Pass destination CPU as a parameter to migrate_task_rq · 1327237a
      Srikar Dronamraju authored
      This additional parameter (new_cpu) is used later for identifying if
      task migration is across nodes.
      
      No functional change.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     203353  200668   -1.32036
      1     328205  321791   -1.95427
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      1     214384  204848   -4.44809
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     188553  188098   -0.241311
      1     196273  200351   2.07772
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     57581.2  58145.9  0.980702
      1     103468   103798   0.318939
      
      Brings out the variance between different specjbb2005 runs.
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        13,941,377      13,912,183
      migrations                1,157,323       1,155,931
      faults                    382,175         367,139
      cache-misses              54,993,823,500  54,240,196,814
      sched:sched_move_numa     2,005           1,571
      sched:sched_stick_numa    14              9
      sched:sched_swap_numa     529             463
      migrate:mm_migrate_pages  1,573           703
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        67099   50155
      numa_hint_faults_local  58456   45264
      numa_hit                240416  239652
      numa_huge_pte_updates   18      36
      numa_interleave         65      68
      numa_local              240339  239576
      numa_other              77      76
      numa_pages_migrated     1574    680
      numa_pte_updates        77182   71146
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,176,453       3,156,720
      migrations                30,238          30,354
      faults                    87,869          97,261
      cache-misses              12,544,479,391  12,400,026,826
      sched:sched_move_numa     23              4
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     6               1
      migrate:mm_migrate_pages  10              20
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        236     272
      numa_hint_faults_local  201     186
      numa_hit                72293   71362
      numa_huge_pte_updates   0       0
      numa_interleave         26      23
      numa_local              72233   71299
      numa_other              60      63
      numa_pages_migrated     8       2
      numa_pte_updates        0       0
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,478,820    8,606,824
      migrations                171,323      155,352
      faults                    307,499      301,409
      cache-misses              240,353,599  157,759,224
      sched:sched_move_numa     214          168
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     4            3
      migrate:mm_migrate_pages  89           125
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        5301    4650
      numa_hint_faults_local  4745    3946
      numa_hit                92943   90489
      numa_huge_pte_updates   0       0
      numa_interleave         899     892
      numa_local              92345   90034
      numa_other              598     455
      numa_pages_migrated     88      124
      numa_pte_updates        5505    4818
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before      After
      cs                        2,066,172   2,113,167
      migrations                11,076      10,533
      faults                    149,544     142,727
      cache-misses              10,398,067  5,594,192
      sched:sched_move_numa     43          10
      sched:sched_stick_numa    0           0
      sched:sched_swap_numa     0           0
      migrate:mm_migrate_pages  6           6
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        3552    744
      numa_hint_faults_local  3347    584
      numa_hit                25611   25551
      numa_huge_pte_updates   0       0
      numa_interleave         213     263
      numa_local              25583   25302
      numa_other              28      249
      numa_pages_migrated     6       6
      numa_pte_updates        3535    744
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        99,358,136       101,227,352
      migrations                4,041,607        4,151,829
      faults                    749,653          745,233
      cache-misses              225,562,543,251  224,669,561,766
      sched:sched_move_numa     771              617
      sched:sched_stick_numa    14               2
      sched:sched_swap_numa     204              187
      migrate:mm_migrate_pages  1,180            316
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        27409   24195
      numa_hint_faults_local  20677   21639
      numa_hit                239988  238331
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              239983  238331
      numa_other              5       0
      numa_pages_migrated     1016    204
      numa_pte_updates        27916   24561
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        60,899,307      62,738,978
      migrations                544,668         562,702
      faults                    270,834         228,465
      cache-misses              74,543,455,635  75,778,067,952
      sched:sched_move_numa     735             648
      sched:sched_stick_numa    25              13
      sched:sched_swap_numa     174             137
      migrate:mm_migrate_pages  816             733
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        11059   10281
      numa_hint_faults_local  4733    3242
      numa_hit                41384   36338
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              41383   36338
      numa_other              1       0
      numa_pages_migrated     815     706
      numa_pte_updates        11323   10176
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-3-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1327237a
    • Srikar Dronamraju's avatar
      sched/numa: Stop multiple tasks from moving to the CPU at the same time · a4739eca
      Srikar Dronamraju authored
      Task migration under NUMA balancing can happen in parallel. More than
      one task might choose to migrate to the same CPU at the same time. This
      can result in:
      
      - During task swap, choosing a task that was not part of the evaluation.
      - During task swap, task which just got moved into its preferred node,
        moving to a completely different node.
      - During task swap, task failing to move to the preferred node, will have
        to wait an extra interval for the next migrate opportunity.
      - During task movement, multiple task movements can cause load imbalance.
      
      This problem is more likely if there are more cores per node or more
      nodes in the system.
      
      Use a per run-queue variable to check if NUMA-balance is active on the
      run-queue.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     200194  203353   1.57797
      1     311331  328205   5.41995
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      1     197654  214384   8.46429
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     192605  188553   -2.10379
      1     213402  196273   -8.02664
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     52227.1  57581.2  10.2516
      1     102529   103468   0.915838
      
      There is a regression on power 9 box. If we look at the details,
      that box has a sudden jump in cache-misses with this patch.
      All other parameters seem to be pointing towards NUMA
      consolidation.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        13,345,784      13,941,377
      migrations                1,127,820       1,157,323
      faults                    374,736         382,175
      cache-misses              55,132,054,603  54,993,823,500
      sched:sched_move_numa     1,923           2,005
      sched:sched_stick_numa    52              14
      sched:sched_swap_numa     595             529
      migrate:mm_migrate_pages  1,932           1,573
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        60605   67099
      numa_hint_faults_local  51804   58456
      numa_hit                239945  240416
      numa_huge_pte_updates   14      18
      numa_interleave         60      65
      numa_local              239865  240339
      numa_other              80      77
      numa_pages_migrated     1931    1574
      numa_pte_updates        67823   77182
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,016,467       3,176,453
      migrations                37,326          30,238
      faults                    115,342         87,869
      cache-misses              11,692,155,554  12,544,479,391
      sched:sched_move_numa     965             23
      sched:sched_stick_numa    8               0
      sched:sched_swap_numa     35              6
      migrate:mm_migrate_pages  1,168           10
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        16286   236
      numa_hint_faults_local  11863   201
      numa_hit                112482  72293
      numa_huge_pte_updates   33      0
      numa_interleave         20      26
      numa_local              112419  72233
      numa_other              63      60
      numa_pages_migrated     1144    8
      numa_pte_updates        32859   0
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,629,724    8,478,820
      migrations                221,052      171,323
      faults                    308,661      307,499
      cache-misses              135,574,913  240,353,599
      sched:sched_move_numa     147          214
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     2            4
      migrate:mm_migrate_pages  64           89
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        11481   5301
      numa_hint_faults_local  10968   4745
      numa_hit                89773   92943
      numa_huge_pte_updates   0       0
      numa_interleave         1116    899
      numa_local              89220   92345
      numa_other              553     598
      numa_pages_migrated     62      88
      numa_pte_updates        11694   5505
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before     After
      cs                        2,272,887  2,066,172
      migrations                12,206     11,076
      faults                    163,704    149,544
      cache-misses              4,801,186  10,398,067
      sched:sched_move_numa     44         43
      sched:sched_stick_numa    0          0
      sched:sched_swap_numa     0          0
      migrate:mm_migrate_pages  17         6
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        2261    3552
      numa_hint_faults_local  1993    3347
      numa_hit                25726   25611
      numa_huge_pte_updates   0       0
      numa_interleave         239     213
      numa_local              25498   25583
      numa_other              228     28
      numa_pages_migrated     17      6
      numa_pte_updates        2266    3535
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        117,980,962      99,358,136
      migrations                3,950,220        4,041,607
      faults                    736,979          749,653
      cache-misses              224,976,072,879  225,562,543,251
      sched:sched_move_numa     504              771
      sched:sched_stick_numa    50               14
      sched:sched_swap_numa     239              204
      migrate:mm_migrate_pages  1,260            1,180
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        18293   27409
      numa_hint_faults_local  11969   20677
      numa_hit                240854  239988
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              240851  239983
      numa_other              3       5
      numa_pages_migrated     1190    1016
      numa_pte_updates        18106   27916
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        61,053,158      60,899,307
      migrations                551,586         544,668
      faults                    244,174         270,834
      cache-misses              74,326,766,973  74,543,455,635
      sched:sched_move_numa     344             735
      sched:sched_stick_numa    24              25
      sched:sched_swap_numa     140             174
      migrate:mm_migrate_pages  568             816
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        6461    11059
      numa_hint_faults_local  2283    4733
      numa_hit                35661   41384
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              35661   41383
      numa_other              0       1
      numa_pages_migrated     568     815
      numa_pte_updates        6518    11323
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-2-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a4739eca
  2. 29 Sep, 2018 12 commits
    • Greg Kroah-Hartman's avatar
      Merge tag 'for-linus-20180929' of git://git.kernel.dk/linux-block · 291d0e5d
      Greg Kroah-Hartman authored
      Jens writes:
        "Block fixes for 4.19-rc6
      
         A set of fixes that should go into this release. This pull request
         contains:
      
         - A fix (hopefully) for the persistent grants for xen-blkfront. A
           previous fix from this series wasn't complete, hence reverted, and
           this one should hopefully be it. (Boris Ostrovsky)
      
         - Fix for an elevator drain warning with SMR devices, which is
           triggered when you switch schedulers (Damien)
      
         - bcache deadlock fix (Guoju Fang)
      
         - Fix for the block unplug tracepoint, which has had the
           timer/explicit flag reverted since 4.11 (Ilya)
      
         - Fix a regression in this series where the blk-mq timeout hook is
           invoked with the RCU read lock held, hence preventing it from
           blocking (Keith)
      
         - NVMe pull from Christoph, with a single multipath fix (Susobhan Dey)"
      
      * tag 'for-linus-20180929' of git://git.kernel.dk/linux-block:
        xen/blkfront: correct purging of persistent grants
        Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
        blk-mq: I/O and timer unplugs are inverted in blktrace
        bcache: add separate workqueue for journal_write to avoid deadlock
        xen/blkfront: When purging persistent grants, keep them in the buffer
        block: fix deadline elevator drain for zoned block devices
        blk-mq: Allow blocking queue tag iter callbacks
        nvme: properly propagate errors in nvme_mpath_init
      291d0e5d
    • Greg Kroah-Hartman's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7541773
      Greg Kroah-Hartman authored
      Thomas writes:
        "A single fix for the AMD memory encryption boot code so it does not
         read random garbage instead of the cached encryption bit when a kexec
         kernel is allocated above the 32bit address limit."
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot: Fix kexec booting failure in the SEV bit detection code
      e7541773
    • Greg Kroah-Hartman's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e1ce697d
      Greg Kroah-Hartman authored
      Thomas writes:
        "Three small fixes for clocksource drivers:
         - Proper error handling in the Atmel PIT driver
         - Add CLOCK_SOURCE_SUSPEND_NONSTOP for TI SoCs so suspend works again
         - Fix the next event function for Facebook Backpack-CMM BMC chips so
           usleep(100) doesnt sleep several milliseconds"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-atmel-pit: Properly handle error cases
        clocksource/drivers/fttmr010: Fix set_next_event handler
        clocksource/drivers/ti-32k: Add CLOCK_SOURCE_SUSPEND_NONSTOP flag for non-am43 SoCs
      e1ce697d
    • Greg Kroah-Hartman's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af17b3aa
      Greg Kroah-Hartman authored
      Thomas writes:
        "A single fix for a missing sanity check when a pinned event is tried
        to be read on the wrong CPU due to a legit event scheduling failure."
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Add sanity check to deal with pinned event failure
      af17b3aa
    • Greg Kroah-Hartman's avatar
      Merge tag 'pm-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 82ec752c
      Greg Kroah-Hartman authored
      Rafael writes:
        "Power management fix for 4.19-rc6
      
         Fix incorrect __init and __exit annotations in the Qualcomm
         Kryo cpufreq driver (Nathan Chancellor)."
      
      * tag 'pm-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: qcom-kryo: Fix section annotations
      82ec752c
    • Nathan Chancellor's avatar
      cpufreq: qcom-kryo: Fix section annotations · d51aea13
      Nathan Chancellor authored
      There is currently a warning when building the Kryo cpufreq driver into
      the kernel image:
      
      WARNING: vmlinux.o(.text+0x8aa424): Section mismatch in reference from
      the function qcom_cpufreq_kryo_probe() to the function
      .init.text:qcom_cpufreq_kryo_get_msm_id()
      The function qcom_cpufreq_kryo_probe() references
      the function __init qcom_cpufreq_kryo_get_msm_id().
      This is often because qcom_cpufreq_kryo_probe lacks a __init
      annotation or the annotation of qcom_cpufreq_kryo_get_msm_id is wrong.
      
      Remove the '__init' annotation from qcom_cpufreq_kryo_get_msm_id
      so that there is no more mismatch warning.
      
      Additionally, Nick noticed that the remove function was marked as
      '__init' when it should really be marked as '__exit'.
      
      Fixes: 46e2856b (cpufreq: Add Kryo CPU scaling driver)
      Fixes: 5ad7346b (cpufreq: kryo: Add module remove and exit)
      Reported-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.18+ <stable@vger.kernel.org> # 4.18+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d51aea13
    • Greg Kroah-Hartman's avatar
      Merge tag 'dma-mapping-4.19-3' of git://git.infradead.org/users/hch/dma-mapping · 7a6878bb
      Greg Kroah-Hartman authored
      Christoph writes:
        "dma mapping fix for 4.19-rc6
      
         fix a missing Kconfig symbol for commits introduced in 4.19-rc"
      
      * tag 'dma-mapping-4.19-3' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
      7a6878bb
    • Greg Kroah-Hartman's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · e704966c
      Greg Kroah-Hartman authored
      Dmitry writes:
        "Input updates for v4.19-rc5
      
         Just a few driver fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: uinput - allow for max == min during input_absinfo validation
        Input: elantech - enable middle button of touchpad on ThinkPad P72
        Input: atakbd - fix Atari CapsLock behaviour
        Input: atakbd - fix Atari keymap
        Input: egalax_ts - add system wakeup support
        Input: gpio-keys - fix a documentation index issue
      e704966c
    • Greg Kroah-Hartman's avatar
      Merge tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 2f19e7a7
      Greg Kroah-Hartman authored
      Mark writes:
        "spi: Fixes for v4.19
      
         Quite a few fixes for the Renesas drivers in here, plus a fix for the
         Tegra driver and some documentation fixes for the recently added
         spi-mem code.  The Tegra fix is relatively large but fairly
         straightforward and mechanical, it runs on probe so it's been
         reasonably well covered in -next testing."
      
      * tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
        spi: spi-mem: Add missing description for data.nbytes field
        spi: rspi: Fix interrupted DMA transfers
        spi: rspi: Fix invalid SPI use during system suspend
        spi: sh-msiof: Fix handling of write value for SISTR register
        spi: sh-msiof: Fix invalid SPI use during system suspend
        spi: gpio: Fix copy-and-paste error
        spi: tegra20-slink: explicitly enable/disable clock
      2f19e7a7
    • Greg Kroah-Hartman's avatar
      Merge tag 'regulator-v4.19-rc5' of... · 8f056611
      Greg Kroah-Hartman authored
      Merge tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Mark writes:
        "regulator: Fixes for 4.19
      
         A collection of fairly minor bug fixes here, a couple of driver
         specific ones plus two core fixes.  There's one fix for the new
         suspend state code which fixes some confusion with constant values
         that are supposed to indicate noop operation and another fixing a
         race condition with the creation of sysfs files on new regulators."
      
      * tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: fix crash caused by null driver data
        regulator: Fix 'do-nothing' value for regulators without suspend state
        regulator: da9063: fix DT probing with constraints
        regulator: bd71837: Disable voltage monitoring for LDO3/4
      8f056611
    • Greg Kroah-Hartman's avatar
      Merge tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · f005de01
      Greg Kroah-Hartman authored
      Michael writes:
        "powerpc fixes for 4.19 #3
      
         A reasonably big batch of fixes due to me being away for a few weeks.
      
         A fix for the TM emulation support on Power9, which could result in
         corrupting the guest r11 when running under KVM.
      
         Two fixes to the TM code which could lead to userspace GPR corruption
         if we take an SLB miss at exactly the wrong time.
      
         Our dynamic patching code had a bug that meant we could patch freed
         __init text, which could lead to corrupting userspace memory.
      
         csum_ipv6_magic() didn't work on little endian platforms since we
         optimised it recently.
      
         A fix for an endian bug when reading a device tree property telling
         us how many storage keys the machine has available.
      
         Fix a crash seen on some configurations of PowerVM when migrating the
         partition from one machine to another.
      
         A fix for a regression in the setup of our CPU to NUMA node mapping
         in KVM guests.
      
         A fix to our selftest Makefiles to make them work since a recent
         change to the shared Makefile logic."
      
      * tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Fix Makefiles for headers_install change
        powerpc/numa: Use associativity if VPHN hcall is successful
        powerpc/tm: Avoid possible userspace r1 corruption on reclaim
        powerpc/tm: Fix userspace r13 corruption
        powerpc/pseries: Fix unitialized timer reset on migration
        powerpc/pkeys: Fix reading of ibm, processor-storage-keys property
        powerpc: fix csum_ipv6_magic() on little endian platforms
        powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
        powerpc: Avoid code patching freed init sections
        KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
      f005de01
    • Greg Kroah-Hartman's avatar
      Merge tag 'pinctrl-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 900915f9
      Greg Kroah-Hartman authored
      Linus writes:
        "Pin control fixes for v4.19:
         - Fixes to x86 hardware:
         - AMD interrupt debounce issues
         - Faulty Intel cannonlake register offset
         - Revert pin translation IRQ locking"
      
      * tag 'pinctrl-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        Revert "pinctrl: intel: Do pin translation when lock IRQ"
        pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant
        pinctrl/amd: poll InterruptEnable bits in amd_gpio_irq_set_type
      900915f9
  3. 28 Sep, 2018 8 commits
  4. 27 Sep, 2018 11 commits