1. 10 Jan, 2018 14 commits
    • Juri Lelli's avatar
      sched/cpufreq: Change the worker kthread to SCHED_DEADLINE · 794a56eb
      Juri Lelli authored
      Worker kthread needs to be able to change frequency for all other
      threads.
      
      Make it special, just under STOP class.
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Claudio Scordino <claudio@evidence.eu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-4-juri.lelli@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      794a56eb
    • Juri Lelli's avatar
      sched/deadline: Move CPU frequency selection triggering points · e0367b12
      Juri Lelli authored
      Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
      fraction of CPU bandwidth to tasks admitted to the system), there is no
      point in evaluating frequency changes during each tick event.
      
      Move frequency selection triggering points to where running_bw changes.
      Co-authored-by: default avatarClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-3-juri.lelli@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e0367b12
    • Juri Lelli's avatar
      sched/cpufreq: Use the DEADLINE utilization signal · d4edd662
      Juri Lelli authored
      SCHED_DEADLINE tracks active utilization signal with a per dl_rq
      variable named running_bw.
      
      Make use of that to drive CPU frequency selection: add up FAIR and
      DEADLINE contribution to get the required CPU capacity to handle both
      requirements (while RT still selects max frequency).
      Co-authored-by: default avatarClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-2-juri.lelli@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d4edd662
    • Juri Lelli's avatar
      sched/deadline: Implement "runtime overrun signal" support · 34be3930
      Juri Lelli authored
      This patch adds the possibility of getting the delivery of a SIGXCPU
      signal whenever there is a runtime overrun. The request is done through
      the sched_flags field within the sched_attr structure.
      
      Forward port of https://lkml.org/lkml/2009/10/16/170Tested-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: default avatarClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: default avatarLuca Abeni <luca.abeni@santannapisa.it>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/1513077024-25461-1-git-send-email-claudio@evidence.eu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      34be3930
    • Mel Gorman's avatar
      sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache · 7332dec0
      Mel Gorman authored
      If waking from an idle CPU due to an interrupt then it's possible that
      the waker task will be pulled to wake on the current CPU. Unfortunately,
      depending on the type of interrupt and IRQ configuration, there may not
      be a strong relationship between the CPU an interrupt was delivered on
      and the CPU a task was running on. For example, the interrupts could all
      be delivered to CPUs on one particular node due to the machine topology
      or IRQ affinity configuration. Another example is an interrupt for an IO
      completion which can be delivered to any CPU where there is no guarantee
      the data is either cache hot or even local.
      
      This patch was motivated by the observation that an IO workload was
      being pulled cross-node on a frequent basis when IO completed.  From a
      wakeup latency perspective, it's still useful to know that an idle CPU is
      immediately available for use but lets only consider an automatic migration
      if the CPUs share cache to limit damage due to NUMA migrations. Migrations
      may still occur if wake_affine_weight determines it's appropriate.
      
      These are the throughput results for dbench running on ext4 comparing
      4.15-rc3 and this patch on a 2-socket machine where interrupts due to IO
      completions can happen on any CPU.
      
                                4.15.0-rc3             4.15.0-rc3
                                   vanilla            lessmigrate
      Hmean     1        854.64 (   0.00%)      865.01 (   1.21%)
      Hmean     2       1229.60 (   0.00%)     1274.44 (   3.65%)
      Hmean     4       1591.81 (   0.00%)     1628.08 (   2.28%)
      Hmean     8       1845.04 (   0.00%)     1831.80 (  -0.72%)
      Hmean     16      2038.61 (   0.00%)     2091.44 (   2.59%)
      Hmean     32      2327.19 (   0.00%)     2430.29 (   4.43%)
      Hmean     64      2570.61 (   0.00%)     2568.54 (  -0.08%)
      Hmean     128     2481.89 (   0.00%)     2499.28 (   0.70%)
      Stddev    1         14.31 (   0.00%)        5.35 (  62.65%)
      Stddev    2         21.29 (   0.00%)       11.09 (  47.92%)
      Stddev    4          7.22 (   0.00%)        6.80 (   5.92%)
      Stddev    8         26.70 (   0.00%)        9.41 (  64.76%)
      Stddev    16        22.40 (   0.00%)       20.01 (  10.70%)
      Stddev    32        45.13 (   0.00%)       44.74 (   0.85%)
      Stddev    64        93.10 (   0.00%)       93.18 (  -0.09%)
      Stddev    128      184.28 (   0.00%)      177.85 (   3.49%)
      
      Note the small increase in throughput for low thread counts but also
      note that the standard deviation for each sample during the test run is
      lower. The throughput figures for dbench can be misleading so the benchmark
      is actually modified to time the latency of the processing of one load
      file with many samples taken. The difference in latency is
      
                                 4.15.0-rc3             4.15.0-rc3
                                    vanilla            lessmigrate
      Amean      1         21.71 (   0.00%)       21.47 (   1.08%)
      Amean      2         30.89 (   0.00%)       29.58 (   4.26%)
      Amean      4         47.54 (   0.00%)       46.61 (   1.97%)
      Amean      8         82.71 (   0.00%)       82.81 (  -0.12%)
      Amean      16       149.45 (   0.00%)      145.01 (   2.97%)
      Amean      32       265.49 (   0.00%)      248.43 (   6.42%)
      Amean      64       463.23 (   0.00%)      463.55 (  -0.07%)
      Amean      128      933.97 (   0.00%)      935.50 (  -0.16%)
      Stddev     1          1.58 (   0.00%)        1.54 (   2.26%)
      Stddev     2          2.84 (   0.00%)        2.95 (  -4.15%)
      Stddev     4          6.78 (   0.00%)        6.85 (  -0.99%)
      Stddev     8         16.85 (   0.00%)       16.37 (   2.85%)
      Stddev     16        41.59 (   0.00%)       41.04 (   1.32%)
      Stddev     32       111.05 (   0.00%)      105.11 (   5.35%)
      Stddev     64       285.94 (   0.00%)      288.01 (  -0.72%)
      Stddev     128      803.39 (   0.00%)      809.73 (  -0.79%)
      
      It's a small improvement which is not surprising given that migrations that
      migrate to a different node as not that common. However, it is noticeable
      in the CPU migration statistics which are reduced by 24%.
      
      There was a query for v1 of this patch about NAS so here are the results
      for C-class using MPI for parallelisation on the same machine
      
      nas-mpi
                            4.15.0-rc3             4.15.0-rc3
                               vanilla                  noirq
      Time cg.C       24.25 (   0.00%)       23.17 (   4.45%)
      Time ep.C        8.22 (   0.00%)        8.29 (  -0.85%)
      Time ft.C       22.67 (   0.00%)       20.34 (  10.28%)
      Time is.C        1.42 (   0.00%)        1.47 (  -3.52%)
      Time lu.C       55.62 (   0.00%)       54.81 (   1.46%)
      Time mg.C        7.93 (   0.00%)        7.91 (   0.25%)
      
                4.15.0-rc3  4.15.0-rc3
                   vanilla  noirq-v1r1
      User         3799.96     3748.34
      System        672.10      626.15
      Elapsed        91.91       79.49
      
      lu.C sees a small gain, ft.C a large gain and ep.C and is.C see small
      regressions but in terms of absolute time, the difference is small and
      likely within run-to-run variance. System CPU usage is slightly reduced.
      
      schbench from Facebook was also requested. This is a bit of a mixed bag but
      it's important to note that this workload should not be heavily impacted
      by wakeups from interrupt context.
      
                                       4.15.0-rc3             4.15.0-rc3
                                          vanilla             noirq-v1r1
      Lat 50.00th-qrtle-1        41.00 (   0.00%)       41.00 (   0.00%)
      Lat 75.00th-qrtle-1        42.00 (   0.00%)       42.00 (   0.00%)
      Lat 90.00th-qrtle-1        43.00 (   0.00%)       44.00 (  -2.33%)
      Lat 95.00th-qrtle-1        44.00 (   0.00%)       46.00 (  -4.55%)
      Lat 99.00th-qrtle-1        57.00 (   0.00%)       58.00 (  -1.75%)
      Lat 99.50th-qrtle-1        59.00 (   0.00%)       59.00 (   0.00%)
      Lat 99.90th-qrtle-1        67.00 (   0.00%)       78.00 ( -16.42%)
      Lat 50.00th-qrtle-2        40.00 (   0.00%)       51.00 ( -27.50%)
      Lat 75.00th-qrtle-2        45.00 (   0.00%)       56.00 ( -24.44%)
      Lat 90.00th-qrtle-2        53.00 (   0.00%)       59.00 ( -11.32%)
      Lat 95.00th-qrtle-2        57.00 (   0.00%)       61.00 (  -7.02%)
      Lat 99.00th-qrtle-2        67.00 (   0.00%)       71.00 (  -5.97%)
      Lat 99.50th-qrtle-2        69.00 (   0.00%)       74.00 (  -7.25%)
      Lat 99.90th-qrtle-2        83.00 (   0.00%)       77.00 (   7.23%)
      Lat 50.00th-qrtle-4        51.00 (   0.00%)       51.00 (   0.00%)
      Lat 75.00th-qrtle-4        57.00 (   0.00%)       56.00 (   1.75%)
      Lat 90.00th-qrtle-4        60.00 (   0.00%)       59.00 (   1.67%)
      Lat 95.00th-qrtle-4        62.00 (   0.00%)       62.00 (   0.00%)
      Lat 99.00th-qrtle-4        73.00 (   0.00%)       72.00 (   1.37%)
      Lat 99.50th-qrtle-4        76.00 (   0.00%)       74.00 (   2.63%)
      Lat 99.90th-qrtle-4        85.00 (   0.00%)       78.00 (   8.24%)
      Lat 50.00th-qrtle-8        54.00 (   0.00%)       58.00 (  -7.41%)
      Lat 75.00th-qrtle-8        59.00 (   0.00%)       62.00 (  -5.08%)
      Lat 90.00th-qrtle-8        65.00 (   0.00%)       66.00 (  -1.54%)
      Lat 95.00th-qrtle-8        67.00 (   0.00%)       70.00 (  -4.48%)
      Lat 99.00th-qrtle-8        78.00 (   0.00%)       79.00 (  -1.28%)
      Lat 99.50th-qrtle-8        81.00 (   0.00%)       80.00 (   1.23%)
      Lat 99.90th-qrtle-8       116.00 (   0.00%)       83.00 (  28.45%)
      Lat 50.00th-qrtle-16       65.00 (   0.00%)       64.00 (   1.54%)
      Lat 75.00th-qrtle-16       77.00 (   0.00%)       71.00 (   7.79%)
      Lat 90.00th-qrtle-16       83.00 (   0.00%)       82.00 (   1.20%)
      Lat 95.00th-qrtle-16       87.00 (   0.00%)       87.00 (   0.00%)
      Lat 99.00th-qrtle-16       95.00 (   0.00%)       96.00 (  -1.05%)
      Lat 99.50th-qrtle-16       99.00 (   0.00%)      103.00 (  -4.04%)
      Lat 99.90th-qrtle-16      104.00 (   0.00%)      122.00 ( -17.31%)
      Lat 50.00th-qrtle-32       71.00 (   0.00%)       73.00 (  -2.82%)
      Lat 75.00th-qrtle-32       91.00 (   0.00%)       92.00 (  -1.10%)
      Lat 90.00th-qrtle-32      108.00 (   0.00%)      107.00 (   0.93%)
      Lat 95.00th-qrtle-32      118.00 (   0.00%)      115.00 (   2.54%)
      Lat 99.00th-qrtle-32      134.00 (   0.00%)      129.00 (   3.73%)
      Lat 99.50th-qrtle-32      138.00 (   0.00%)      133.00 (   3.62%)
      Lat 99.90th-qrtle-32      149.00 (   0.00%)      146.00 (   2.01%)
      Lat 50.00th-qrtle-39       83.00 (   0.00%)       81.00 (   2.41%)
      Lat 75.00th-qrtle-39      105.00 (   0.00%)      102.00 (   2.86%)
      Lat 90.00th-qrtle-39      120.00 (   0.00%)      119.00 (   0.83%)
      Lat 95.00th-qrtle-39      129.00 (   0.00%)      128.00 (   0.78%)
      Lat 99.00th-qrtle-39      153.00 (   0.00%)      149.00 (   2.61%)
      Lat 99.50th-qrtle-39      166.00 (   0.00%)      156.00 (   6.02%)
      Lat 99.90th-qrtle-39    12304.00 (   0.00%)    12848.00 (  -4.42%)
      
      When heavily loaded (e.g. 99.50th-qrtle-39 indicates 39 threads), there
      are small gains in many cases. Otherwise it depends on the quartile used
      where it can be bad -- e.g. 75.00th-qrtle-2. However, even these results
      are probably a co-incidence. For this workload, much depends on what node
      the threads get placed on and their relative locality and not wakeups from
      interrupt context. A larger component on how it behaves would be automatic
      NUMA balancing where a fault incurred to measure locality would be a much
      larger contributer to latency than the wakeup path.
      
      This is the results from an almost identical machine that happened to run
      the same test.  They only differ in terms of storage which is irrelevant
      for this test.
      
                                       4.15.0-rc3             4.15.0-rc3
                                          vanilla             noirq-v1r1
      Lat 50.00th-qrtle-1        41.00 (   0.00%)       41.00 (   0.00%)
      Lat 75.00th-qrtle-1        42.00 (   0.00%)       42.00 (   0.00%)
      Lat 90.00th-qrtle-1        44.00 (   0.00%)       43.00 (   2.27%)
      Lat 95.00th-qrtle-1        53.00 (   0.00%)       45.00 (  15.09%)
      Lat 99.00th-qrtle-1        59.00 (   0.00%)       58.00 (   1.69%)
      Lat 99.50th-qrtle-1        60.00 (   0.00%)       59.00 (   1.67%)
      Lat 99.90th-qrtle-1        86.00 (   0.00%)       61.00 (  29.07%)
      Lat 50.00th-qrtle-2        52.00 (   0.00%)       41.00 (  21.15%)
      Lat 75.00th-qrtle-2        57.00 (   0.00%)       46.00 (  19.30%)
      Lat 90.00th-qrtle-2        60.00 (   0.00%)       53.00 (  11.67%)
      Lat 95.00th-qrtle-2        62.00 (   0.00%)       57.00 (   8.06%)
      Lat 99.00th-qrtle-2        73.00 (   0.00%)       68.00 (   6.85%)
      Lat 99.50th-qrtle-2        74.00 (   0.00%)       71.00 (   4.05%)
      Lat 99.90th-qrtle-2        90.00 (   0.00%)       75.00 (  16.67%)
      Lat 50.00th-qrtle-4        57.00 (   0.00%)       52.00 (   8.77%)
      Lat 75.00th-qrtle-4        60.00 (   0.00%)       58.00 (   3.33%)
      Lat 90.00th-qrtle-4        62.00 (   0.00%)       62.00 (   0.00%)
      Lat 95.00th-qrtle-4        65.00 (   0.00%)       65.00 (   0.00%)
      Lat 99.00th-qrtle-4        76.00 (   0.00%)       75.00 (   1.32%)
      Lat 99.50th-qrtle-4        77.00 (   0.00%)       77.00 (   0.00%)
      Lat 99.90th-qrtle-4        87.00 (   0.00%)       81.00 (   6.90%)
      Lat 50.00th-qrtle-8        59.00 (   0.00%)       57.00 (   3.39%)
      Lat 75.00th-qrtle-8        63.00 (   0.00%)       62.00 (   1.59%)
      Lat 90.00th-qrtle-8        66.00 (   0.00%)       67.00 (  -1.52%)
      Lat 95.00th-qrtle-8        68.00 (   0.00%)       70.00 (  -2.94%)
      Lat 99.00th-qrtle-8        79.00 (   0.00%)       80.00 (  -1.27%)
      Lat 99.50th-qrtle-8        80.00 (   0.00%)       84.00 (  -5.00%)
      Lat 99.90th-qrtle-8        84.00 (   0.00%)       90.00 (  -7.14%)
      Lat 50.00th-qrtle-16       65.00 (   0.00%)       65.00 (   0.00%)
      Lat 75.00th-qrtle-16       77.00 (   0.00%)       75.00 (   2.60%)
      Lat 90.00th-qrtle-16       84.00 (   0.00%)       83.00 (   1.19%)
      Lat 95.00th-qrtle-16       88.00 (   0.00%)       87.00 (   1.14%)
      Lat 99.00th-qrtle-16       97.00 (   0.00%)       96.00 (   1.03%)
      Lat 99.50th-qrtle-16      100.00 (   0.00%)      104.00 (  -4.00%)
      Lat 99.90th-qrtle-16      110.00 (   0.00%)      126.00 ( -14.55%)
      Lat 50.00th-qrtle-32       70.00 (   0.00%)       71.00 (  -1.43%)
      Lat 75.00th-qrtle-32       92.00 (   0.00%)       94.00 (  -2.17%)
      Lat 90.00th-qrtle-32      110.00 (   0.00%)      110.00 (   0.00%)
      Lat 95.00th-qrtle-32      121.00 (   0.00%)      118.00 (   2.48%)
      Lat 99.00th-qrtle-32      135.00 (   0.00%)      137.00 (  -1.48%)
      Lat 99.50th-qrtle-32      140.00 (   0.00%)      146.00 (  -4.29%)
      Lat 99.90th-qrtle-32      150.00 (   0.00%)      160.00 (  -6.67%)
      Lat 50.00th-qrtle-39       80.00 (   0.00%)       71.00 (  11.25%)
      Lat 75.00th-qrtle-39      102.00 (   0.00%)       91.00 (  10.78%)
      Lat 90.00th-qrtle-39      118.00 (   0.00%)      108.00 (   8.47%)
      Lat 95.00th-qrtle-39      128.00 (   0.00%)      117.00 (   8.59%)
      Lat 99.00th-qrtle-39      149.00 (   0.00%)      133.00 (  10.74%)
      Lat 99.50th-qrtle-39      160.00 (   0.00%)      139.00 (  13.12%)
      Lat 99.90th-qrtle-39    13808.00 (   0.00%)     4920.00 (  64.37%)
      
      Despite being nearly identical, it showed a variety of major gains so
      I'm not convinced that heavy emphasis should be placed on this particular
      workload in terms of evaluating this particular patch. Further evidence of
      this is the fact that testing on a UMA machine showed small gains/losses
      even though the patch should be a no-op on UMA.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171219085947.13136-2-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7332dec0
    • Joel Fernandes's avatar
      sched/fair: Correct obsolete comment about cpufreq_update_util() · 9783be2c
      Joel Fernandes authored
      Since the remote cpufreq callback work, the cpufreq_update_util() call can happen
      from remote CPUs. The comment about local CPUs is thus obsolete. Update it
      accordingly.
      Signed-off-by: default avatarJoel Fernandes <joelaf@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-2-joelaf@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9783be2c
    • Joel Fernandes's avatar
      sched/fair: Remove impossible condition from find_idlest_group_cpu() · 18cec7e0
      Joel Fernandes authored
      find_idlest_group_cpu() goes through CPUs of a group previous selected by
      find_idlest_group(). find_idlest_group() returns NULL if the local group is the
      selected one and doesn't execute find_idlest_group_cpu if the group to which
      'cpu' belongs to is chosen. So we're always guaranteed to call
      find_idlest_group_cpu() with a group to which 'cpu' is non-local.
      
      This makes one of the conditions in find_idlest_group_cpu() an impossible one,
      which we can get rid off.
      Signed-off-by: default avatarJoel Fernandes <joelaf@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-3-joelaf@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      18cec7e0
    • Viresh Kumar's avatar
      sched/cpufreq: Don't pass flags to sugov_set_iowait_boost() · 5083452f
      Viresh Kumar authored
      We are already passing sg_cpu as argument to sugov_set_iowait_boost()
      helper and the same can be used to retrieve the flags value. Get rid of
      the redundant argument.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/4ec5562b1a87e146ebab11fb5dde1ca9c763a7fb.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5083452f
    • Viresh Kumar's avatar
      sched/cpufreq: Initialize sg_cpu->flags to 0 · 6257e704
      Viresh Kumar authored
      Initializing sg_cpu->flags to SCHED_CPUFREQ_RT has no obvious benefit.
      The flags field wouldn't be used until the utilization update handler is
      called for the first time, and once that is called we will overwrite
      flags anyway.
      
      Initialize it to 0.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarJuri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/763feda6424ced8486b25a0c52979634e6104478.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6257e704
    • Joel Fernandes's avatar
      sched/fair: Consider RT/IRQ pressure in capacity_spare_wake() · f453ae22
      Joel Fernandes authored
      capacity_spare_wake() in the slow path influences choice of idlest groups,
      as we search for groups with maximum spare capacity. In scenarios where
      RT pressure is high, a sub optimal group can be chosen and hurt
      performance of the task being woken up.
      
      Fix this by using capacity_of() instead of capacity_orig_of() in capacity_spare_wake().
      
      Tests results from improvements with this change are below. More tests
      were also done by myself and Matt Fleming to ensure no degradation in
      different benchmarks.
      
      1) Rohit ran barrier.c test (details below) with following improvements:
      ------------------------------------------------------------------------
      This was Rohit's original use case for a patch he posted at [1] however
      from his recent tests he showed my patch can replace his slow path
      changes [1] and there's no need to selectively scan/skip CPUs in
      find_idlest_group_cpu in the slow path to get the improvement he sees.
      
      barrier.c (open_mp code) as a micro-benchmark. It does a number of
      iterations and barrier sync at the end of each for loop.
      
      Here barrier,c is running in along with ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      barrier.c can be found at:
      http://www.spinics.net/lists/kernel/msg2506955.html
      
      Following are the results for the iterations per second with this
      micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
      Intel x86 machine:
      +--------+------------------+---------------------------+
      |Threads | Without patch    | With patch                |
      |        |                  |                           |
      +--------+--------+---------+-----------------+---------+
      |        | Mean   | Std Dev | Mean            | Std Dev |
      +--------+--------+---------+-----------------+---------+
      |1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
      |2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
      |4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
      |8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
      |16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
      |32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
      |64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
      +--------+--------+---------+-----------------+---------+
      
      2) ping+hackbench test on bare-metal sever (by Rohit)
      -----------------------------------------------------
      Here hackbench is running in threaded mode along
      with, running ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      This test is running on 2 socket, 20 core and 40 threads Intel x86
      machine:
      Number of loops is 10000 and runtime is in seconds (Lower is better).
      
      +--------------+-----------------+--------------------------+
      |Task Groups   | Without patch   |  With patch              |
      |              +-------+---------+----------------+---------+
      |(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
      +--------------+-------+---------+----------------+---------+
      |1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
      |2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
      |4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
      |8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
      |16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
      |25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
      +--------------+-------+---------+----------------+---------+
      
      [1] https://patchwork.kernel.org/patch/9991635/
      
      Matt Fleming also ran several different hackbench tests and cyclic test
      to santiy-check that the patch doesn't harm other usecases.
      Tested-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Tested-by: default avatarRohit Jain <rohit.k.jain@oracle.com>
      Signed-off-by: default avatarJoel Fernandes <joelaf@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Brendan Jackman <brendan.jackman@arm.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171214212158.188190-1-joelaf@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f453ae22
    • Patrick Bellasi's avatar
      sched/fair: Use 'unsigned long' for utilization, consistently · f01415fd
      Patrick Bellasi authored
      Utilization and capacity are tracked as 'unsigned long', however some
      functions using them return an 'int' which is ultimately assigned back to
      'unsigned long' variables.
      
      Since there is not scope on using a different and signed type,
      consolidate the signature of functions returning utilization to always
      use the native type.
      
      This change improves code consistency, and it also benefits
      code paths where utilizations should be clamped by avoiding
      further type conversions or ugly type casts.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarChris Redpath <chris.redpath@arm.com>
      Reviewed-by: default avatarBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171205171018.9203-2-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f01415fd
    • rodrigosiqueira's avatar
      sched/core: Rework and clarify prepare_lock_switch() · 31cb1bc0
      rodrigosiqueira authored
      The prepare_lock_switch() function has an unused parameter, and also the
      function name was not descriptive. To improve readability and remove
      the extra parameter, do the following changes:
      
      * Move prepare_lock_switch() from kernel/sched/sched.h to
        kernel/sched/core.c, rename it to prepare_task(), and remove the
        unused parameter.
      
      * Split the smp_store_release() out from finish_lock_switch() to a
        function named finish_task.
      
      * Comments ajdustments.
      Signed-off-by: default avatarRodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215140603.gxe5i2y6fg5ojfpp@smtp.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      31cb1bc0
    • Ingo Molnar's avatar
    • Mathieu Desnoyers's avatar
      membarrier: Disable preemption when calling smp_call_function_many() · 54167607
      Mathieu Desnoyers authored
      smp_call_function_many() requires disabling preemption around the call.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215192310.25293-1-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      54167607
  2. 08 Jan, 2018 1 commit
  3. 06 Jan, 2018 1 commit
  4. 05 Jan, 2018 22 commits
  5. 04 Jan, 2018 2 commits
    • Thomas Gleixner's avatar
      x86/tlb: Drop the _GPL from the cpu_tlbstate export · 1e547681
      Thomas Gleixner authored
      The recent changes for PTI touch cpu_tlbstate from various tlb_flush
      inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
      regression when building out of tree drivers for certain graphics cards.
      
      Aside of that the export was wrong since it was introduced as it should
      have been EXPORT_PER_CPU_SYMBOL_GPL().
      
      Use the correct PER_CPU export and drop the _GPL to restore the previous
      state which allows users to utilize the cards they payed for.
      
      As always I'm really thrilled to make this kind of change to support the
      #friends (or however the hot hashtag of today is spelled) from that closet
      sauce graphics corp.
      
      Fixes: 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: default avatarKees Cook <keescook@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      1e547681
    • Peter Zijlstra's avatar
      x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers · 42f3bdc5
      Peter Zijlstra authored
      Thomas reported the following warning:
      
       BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
       caller is native_flush_tlb_single+0x57/0xc0
       native_flush_tlb_single+0x57/0xc0
       __set_pte_vaddr+0x2d/0x40
       set_pte_vaddr+0x2f/0x40
       cea_set_pte+0x30/0x40
       ds_update_cea.constprop.4+0x4d/0x70
       reserve_ds_buffers+0x159/0x410
       x86_reserve_hardware+0x150/0x160
       x86_pmu_event_init+0x3e/0x1f0
       perf_try_init_event+0x69/0x80
       perf_event_alloc+0x652/0x740
       SyS_perf_event_open+0x3f6/0xd60
       do_syscall_64+0x5c/0x190
      
      set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
      there are two problems with that:
      
       1) The resulting flush is not supposed to be called in preemptible context
      
       2) The cpu entry area is supposed to be per CPU, but the debug store
          buffers are mapped for all CPUs so these mappings need to be flushed
          globally.
      
      Add the necessary preemption protection across the mapping code and flush
      TLBs globally.
      
      Fixes: c1961a46 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
      Reported-by: default avatarThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
      42f3bdc5