1. 28 May, 2020 3 commits
  2. 26 May, 2020 1 commit
  3. 25 May, 2020 2 commits
    • Mel Gorman's avatar
      sched/core: Offload wakee task activation if it the wakee is descheduling · 2ebb1771
      Mel Gorman authored
      The previous commit:
      
        c6e7bd7a: ("sched/core: Optimize ttwu() spinning on p->on_cpu")
      
      avoids spinning on p->on_rq when the task is descheduling, but only if the
      wakee is on a CPU that does not share cache with the waker.
      
      This patch offloads the activation of the wakee to the CPU that is about to
      go idle if the task is the only one on the runqueue. This potentially allows
      the waker task to continue making progress when the wakeup is not strictly
      synchronous.
      
      This is very obvious with netperf UDP_STREAM running on localhost. The
      waker is sending packets as quickly as possible without waiting for any
      reply. It frequently wakes the server for the processing of packets and
      when netserver is using local memory, it quickly completes the processing
      and goes back to idle. The waker often observes that netserver is on_rq
      and spins excessively leading to a drop in throughput.
      
      This is a comparison of 5.7-rc6 against "sched: Optimize ttwu() spinning
      on p->on_cpu" and against this patch labeled vanilla, optttwu-v1r1 and
      localwakelist-v1r2 respectively.
      
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                          vanilla           optttwu-v1r1     localwakelist-v1r2
      Hmean     send-64         251.49 (   0.00%)      258.05 *   2.61%*      305.59 *  21.51%*
      Hmean     send-128        497.86 (   0.00%)      519.89 *   4.43%*      600.25 *  20.57%*
      Hmean     send-256        944.90 (   0.00%)      997.45 *   5.56%*     1140.19 *  20.67%*
      Hmean     send-1024      3779.03 (   0.00%)     3859.18 *   2.12%*     4518.19 *  19.56%*
      Hmean     send-2048      7030.81 (   0.00%)     7315.99 *   4.06%*     8683.01 *  23.50%*
      Hmean     send-3312     10847.44 (   0.00%)    11149.43 *   2.78%*    12896.71 *  18.89%*
      Hmean     send-4096     13436.19 (   0.00%)    13614.09 (   1.32%)    15041.09 *  11.94%*
      Hmean     send-8192     22624.49 (   0.00%)    23265.32 *   2.83%*    24534.96 *   8.44%*
      Hmean     send-16384    34441.87 (   0.00%)    36457.15 *   5.85%*    35986.21 *   4.48%*
      
      Note that this benefit is not universal to all wakeups, it only applies
      to the case where the waker often spins on p->on_rq.
      
      The impact can be seen from a "perf sched latency" report generated from
      a single iteration of one packet size:
      
         -----------------------------------------------------------------------------------------------------------------
          Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
         -----------------------------------------------------------------------------------------------------------------
      
        vanilla
          netperf:4337          |  21709.193 ms |     2932 | avg:    0.002 ms | max:    0.041 ms | max at:    112.154512 s
          netserver:4338        |  14629.459 ms |  5146990 | avg:    0.001 ms | max: 1615.864 ms | max at:    140.134496 s
      
        localwakelist-v1r2
          netperf:4339          |  29789.717 ms |     2460 | avg:    0.002 ms | max:    0.059 ms | max at:    138.205389 s
          netserver:4340        |  18858.767 ms |  7279005 | avg:    0.001 ms | max:    0.362 ms | max at:    135.709683 s
         -----------------------------------------------------------------------------------------------------------------
      
      Note that the average wakeup delay is quite small on both the vanilla
      kernel and with the two patches applied. However, there are significant
      outliers with the vanilla kernel with the maximum one measured as 1615
      milliseconds with a vanilla kernel but never worse than 0.362 ms with
      both patches applied and a much higher rate of context switching.
      
      Similarly a separate profile of cycles showed that 2.83% of all cycles
      were spent in try_to_wake_up() with almost half of the cycles spent
      on spinning on p->on_rq. With the two patches, the percentage of cycles
      spent in try_to_wake_up() drops to 1.13%
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: valentin.schneider@arm.com
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/r/20200524202956.27665-3-mgorman@techsingularity.net
      2ebb1771
    • Peter Zijlstra's avatar
      sched/core: Optimize ttwu() spinning on p->on_cpu · c6e7bd7a
      Peter Zijlstra authored
      Both Rik and Mel reported seeing ttwu() spend significant time on:
      
        smp_cond_load_acquire(&p->on_cpu, !VAL);
      
      Attempt to avoid this by queueing the wakeup on the CPU that owns the
      p->on_cpu value. This will then allow the ttwu() to complete without
      further waiting.
      
      Since we run schedule() with interrupts disabled, the IPI is
      guaranteed to happen after p->on_cpu is cleared, this is what makes it
      safe to queue early.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: valentin.schneider@arm.com
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/r/20200524202956.27665-2-mgorman@techsingularity.net
      c6e7bd7a
  4. 19 May, 2020 26 commits
  5. 17 May, 2020 8 commits
    • Linus Torvalds's avatar
      Linux 5.7-rc6 · b9bbe6ed
      Linus Torvalds authored
      b9bbe6ed
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi · 8feea623
      Linus Torvalds authored
      Pull IPMI update from Corey Minyard:
       "Convert i2c_new_device() to i2c_new_client_device()
      
        Wolfram Sang has asked to have this included in 5.7 so the deprecated
        API can be removed next release. There should be no functional
        difference.
      
        I think that entire this section of code can be removed; it is
        leftover from other things that have since changed, but this is the
        safer thing to do for now. The full removal can happen next release"
      
      * tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi:
        char: ipmi: convert to use i2c_new_client_device()
      8feea623
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 9b1f2cbd
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Some more clk driver fixes and one core framework fix:
      
         - A handful of TI driver fixes for bad of_node_put() and incorrect
           parent names
      
         - Rockchip rk3228 aclk_gpu* creation was interfering with lima GPU
           work so we use a composite clk now
      
         - Resuming from suspend on Tegra Jetson TK1 was broken because an
           audio PLL calculated an incorrect rate
      
         - A fix for devicetree probing on IM-PD1 by actually specifying a clk
           name which is required to pass clk registration
      
         - Avoid list corruption if registration fails for a critical clk"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: ti: clkctrl: convert subclocks to use proper names also
        clk: ti: am33xx: fix RTC clock parent
        clk: ti: clkctrl: Fix Bad of_node_put within clkctrl_get_name
        clk: tegra: Fix initial rate for pll_a on Tegra124
        clk: impd1: Look up clock-output-names
        clk: Unlink clock if failed to prepare or enable
        clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks
      9b1f2cbd
    • Linus Torvalds's avatar
      Merge tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · fb27bc03
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of USB fixes for 5.7-rc6
      
        The "largest" in here is a bunch of raw-gadget fixes and api changes
        as the driver just showed up in -rc1 and work has been done to fix up
        some uapi issues found with the original submission, before it shows
        up in a -final release.
      
        Other than that, a bunch of other small USB gadget fixes, xhci fixes,
        some quirks, andother tiny fixes for reported issues.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
        USB: gadget: fix illegal array access in binding with UDC
        usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
        USB: usbfs: fix mmap dma mismatch
        usb: host: xhci-plat: keep runtime active when removing host
        usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
        usb: cdns3: gadget: make a bunch of functions static
        usb: mtu3: constify struct debugfs_reg32
        usb: gadget: udc: atmel: Make some symbols static
        usb: raw-gadget: fix null-ptr-deref when reenabling endpoints
        usb: raw-gadget: documentation updates
        usb: raw-gadget: support stalling/halting/wedging endpoints
        usb: raw-gadget: fix gadget endpoint selection
        usb: raw-gadget: improve uapi headers comments
        usb: typec: mux: intel: Fix DP_HPD_LVL bit field
        usb: raw-gadget: fix return value of ep read ioctls
        usb: dwc3: select USB_ROLE_SWITCH
        usb: gadget: legacy: fix error return code in gncm_bind()
        usb: gadget: legacy: fix error return code in cdc_bind()
        usb: gadget: legacy: fix redundant initialization warnings
        usb: gadget: tegra-xudc: Fix idle suspend/resume
        ...
      fb27bc03
    • Linus Torvalds's avatar
      Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · b48397cb
      Linus Torvalds authored
      Pull execve fix from Eric Biederman:
       "While working on my exec cleanups I found a bug in exec that I
        introduced by accident a couple of years ago. I apparently missed the
        fact that bprm->file can change.
      
        Now I have a very personal motive to clean up exec and make it more
        approachable.
      
        The change is just moving woud_dump to where it acts on the final
        bprm->file not the initial bprm->file. I have been careful and tested
        and verify this fix works"
      
      * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        exec: Move would_dump into flush_old_exec
      b48397cb
    • Linus Torvalds's avatar
      Merge tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ef0d5b91
      Linus Torvalds authored
      Pull x86 stack unwinding fix from Thomas Gleixner:
       "A single bugfix for the ORC unwinder to ensure that the error flag
        which tells the unwinding code whether a stack trace can be trusted or
        not is always set correctly.
      
        This was messed up by a couple of changes in the recent past"
      
      * tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/unwind/orc: Fix error handling in __unwind_start()
      ef0d5b91
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 43567139
      Linus Torvalds authored
      Pull x86 fix from Borislav Petkov:
       "A single fix for early boot crashes of kernels built with gcc10 and
        stack protector enabled"
      
      * tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix early boot crash on gcc-10, third try
      43567139
    • Eric W. Biederman's avatar
      exec: Move would_dump into flush_old_exec · f87d1c95
      Eric W. Biederman authored
      I goofed when I added mm->user_ns support to would_dump.  I missed the
      fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
      binfmt_script bprm->file is reassigned.  Which made the move of
      would_dump from setup_new_exec to __do_execve_file before exec_binprm
      incorrect as it can result in would_dump running on the script instead
      of the interpreter of the script.
      
      The net result is that the code stopped making unreadable interpreters
      undumpable.  Which allows them to be ptraced and written to disk
      without special permissions.  Oops.
      
      The move was necessary because the call in set_new_exec was after
      bprm->mm was no longer valid.
      
      To correct this mistake move the misplaced would_dump from
      __do_execve_file into flos_old_exec, before exec_mmap is called.
      
      I tested and confirmed that without this fix I can attach with gdb to
      a script with an unreadable interpreter, and with this fix I can not.
      
      Cc: stable@vger.kernel.org
      Fixes: f84df2a6 ("exec: Ensure mm->user_ns contains the execed files")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      f87d1c95