1. 28 May, 2020 11 commits
  2. 26 May, 2020 1 commit
  3. 25 May, 2020 2 commits
    • Mel Gorman's avatar
      sched/core: Offload wakee task activation if it the wakee is descheduling · 2ebb1771
      Mel Gorman authored
      The previous commit:
      
        c6e7bd7a: ("sched/core: Optimize ttwu() spinning on p->on_cpu")
      
      avoids spinning on p->on_rq when the task is descheduling, but only if the
      wakee is on a CPU that does not share cache with the waker.
      
      This patch offloads the activation of the wakee to the CPU that is about to
      go idle if the task is the only one on the runqueue. This potentially allows
      the waker task to continue making progress when the wakeup is not strictly
      synchronous.
      
      This is very obvious with netperf UDP_STREAM running on localhost. The
      waker is sending packets as quickly as possible without waiting for any
      reply. It frequently wakes the server for the processing of packets and
      when netserver is using local memory, it quickly completes the processing
      and goes back to idle. The waker often observes that netserver is on_rq
      and spins excessively leading to a drop in throughput.
      
      This is a comparison of 5.7-rc6 against "sched: Optimize ttwu() spinning
      on p->on_cpu" and against this patch labeled vanilla, optttwu-v1r1 and
      localwakelist-v1r2 respectively.
      
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                          vanilla           optttwu-v1r1     localwakelist-v1r2
      Hmean     send-64         251.49 (   0.00%)      258.05 *   2.61%*      305.59 *  21.51%*
      Hmean     send-128        497.86 (   0.00%)      519.89 *   4.43%*      600.25 *  20.57%*
      Hmean     send-256        944.90 (   0.00%)      997.45 *   5.56%*     1140.19 *  20.67%*
      Hmean     send-1024      3779.03 (   0.00%)     3859.18 *   2.12%*     4518.19 *  19.56%*
      Hmean     send-2048      7030.81 (   0.00%)     7315.99 *   4.06%*     8683.01 *  23.50%*
      Hmean     send-3312     10847.44 (   0.00%)    11149.43 *   2.78%*    12896.71 *  18.89%*
      Hmean     send-4096     13436.19 (   0.00%)    13614.09 (   1.32%)    15041.09 *  11.94%*
      Hmean     send-8192     22624.49 (   0.00%)    23265.32 *   2.83%*    24534.96 *   8.44%*
      Hmean     send-16384    34441.87 (   0.00%)    36457.15 *   5.85%*    35986.21 *   4.48%*
      
      Note that this benefit is not universal to all wakeups, it only applies
      to the case where the waker often spins on p->on_rq.
      
      The impact can be seen from a "perf sched latency" report generated from
      a single iteration of one packet size:
      
         -----------------------------------------------------------------------------------------------------------------
          Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
         -----------------------------------------------------------------------------------------------------------------
      
        vanilla
          netperf:4337          |  21709.193 ms |     2932 | avg:    0.002 ms | max:    0.041 ms | max at:    112.154512 s
          netserver:4338        |  14629.459 ms |  5146990 | avg:    0.001 ms | max: 1615.864 ms | max at:    140.134496 s
      
        localwakelist-v1r2
          netperf:4339          |  29789.717 ms |     2460 | avg:    0.002 ms | max:    0.059 ms | max at:    138.205389 s
          netserver:4340        |  18858.767 ms |  7279005 | avg:    0.001 ms | max:    0.362 ms | max at:    135.709683 s
         -----------------------------------------------------------------------------------------------------------------
      
      Note that the average wakeup delay is quite small on both the vanilla
      kernel and with the two patches applied. However, there are significant
      outliers with the vanilla kernel with the maximum one measured as 1615
      milliseconds with a vanilla kernel but never worse than 0.362 ms with
      both patches applied and a much higher rate of context switching.
      
      Similarly a separate profile of cycles showed that 2.83% of all cycles
      were spent in try_to_wake_up() with almost half of the cycles spent
      on spinning on p->on_rq. With the two patches, the percentage of cycles
      spent in try_to_wake_up() drops to 1.13%
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: valentin.schneider@arm.com
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/r/20200524202956.27665-3-mgorman@techsingularity.net
      2ebb1771
    • Peter Zijlstra's avatar
      sched/core: Optimize ttwu() spinning on p->on_cpu · c6e7bd7a
      Peter Zijlstra authored
      Both Rik and Mel reported seeing ttwu() spend significant time on:
      
        smp_cond_load_acquire(&p->on_cpu, !VAL);
      
      Attempt to avoid this by queueing the wakeup on the CPU that owns the
      p->on_cpu value. This will then allow the ttwu() to complete without
      further waiting.
      
      Since we run schedule() with interrupts disabled, the IPI is
      guaranteed to happen after p->on_cpu is cleared, this is what makes it
      safe to queue early.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: valentin.schneider@arm.com
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/r/20200524202956.27665-2-mgorman@techsingularity.net
      c6e7bd7a
  4. 19 May, 2020 26 commits