1. 06 Feb, 2018 9 commits
    • Mel Gorman's avatar
      sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS · 32e839dd
      Mel Gorman authored
      The select_idle_sibling() (SIS) rewrite in commit:
      
        10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()")
      
      ... replaced a domain iteration with a search that broadly speaking
      does a wrapped walk of the scheduler domain sharing a last-level-cache.
      
      While this had a number of improvements, one consequence is that two tasks
      that share a waker/wakee relationship push each other around a socket. Even
      though two tasks may be active, all cores are evenly used. This is great from
      a search perspective and spreads a load across individual cores, but it has
      adverse consequences for cpufreq. As each CPU has relatively low utilisation,
      cpufreq may decide the utilisation is too low to used a higher P-state and
      overall computation throughput suffers.
      
      While individual cpufreq and cpuidle drivers may compensate by artifically
      boosting P-state (at c0) or avoiding lower C-states (during idle), it does
      not help if hardware-based cpufreq (e.g. HWP) is used.
      
      This patch tracks a recently used CPU based on what CPU a task was running
      on when it last was a waker a CPU it was recently using when a task is a
      wakee. During SIS, the recently used CPU is used as a target if it's still
      allowed by the task and is idle.
      
      The benefit may be non-obvious so consider an example of two tasks
      communicating back and forth. Task A may be an application doing IO where
      task B is a kworker or kthread like journald. Task A may issue IO, wake
      B and B wakes up A on completion.  With the existing scheme this may look
      like the following (potentially different IDs if SMT is in use but similar
      principal applies).
      
       A (cpu 0)	wake	B (wakes on cpu 1)
       B (cpu 1)	wake	A (wakes on cpu 2)
       A (cpu 2)	wake	B (wakes on cpu 3)
       etc.
      
      A careful reader may wonder why CPU 0 was not idle when B wakes A the
      first time and it's simply due to the fact that A can be rescheduled to
      another CPU and the pattern is that prev == target when B tries to wakeup A
      and the information about CPU 0 has been lost.
      
      With this patch, the pattern is more likely to be:
      
       A (cpu 0)	wake	B (wakes on cpu 1)
       B (cpu 1)	wake	A (wakes on cpu 0)
       A (cpu 0)	wake	B (wakes on cpu 1)
       etc
      
      i.e. two communicating casts are more likely to use just two cores instead
      of all available cores sharing a LLC.
      
      The most dramatic speedup was noticed on dbench using the XFS filesystem on
      UMA as clients interact heavily with workqueues in that configuration. Note
      that a similar speedup is not observed on ext4 as the wakeup pattern
      is different:
      
                                4.15.0-rc9             4.15.0-rc9
                                 waprev-v1        biasancestor-v1
       Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
       Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
       Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
       Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
       Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)
      
      The results can be less dramatic on NUMA where automatic balancing interferes
      with the test. It's also known that network benchmarks running on localhost
      also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
      and TCP depending on the machine). Hackbench also seens small improvements
      (6-11% depending on machine and thread count). The facebook schbench was also
      tested but in most cases showed little or no different to wakeup latencies.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      32e839dd
    • Mel Gorman's avatar
      sched/fair: Do not migrate if the prev_cpu is idle · 806486c3
      Mel Gorman authored
      wake_affine_idle() prefers to move a task to the current CPU if the
      wakeup is due to an interrupt. The expectation is that the interrupt
      data is cache hot and relevant to the waking task as well as avoiding
      a search. However, there is no way to determine if there was cache hot
      data on the previous CPU that may exceed the interrupt data. Furthermore,
      round-robin delivery of interrupts can migrate tasks around a socket where
      each CPU is under-utilised.  This can interact badly with cpufreq which
      makes decisions based on per-cpu data. It has been observed on machines
      with HWP that p-states are not boosted to their maximum levels even though
      the workload is latency and throughput sensitive.
      
      This patch uses the previous CPU for the task if it's idle and cache-affine
      with the current CPU even if the current CPU is idle due to the wakup
      being related to the interrupt. This reduces migrations at the cost of
      the interrupt data not being cache hot when the task wakes.
      
      A variety of workloads were tested on various machines and no adverse
      impact was noticed that was outside noise. dbench on ext4 on UMA showed
      roughly 10% reduction in the number of CPU migrations and it is a case
      where interrupts are frequent for IO competions. In most cases, the
      difference in performance is quite small but variability is often
      reduced. For example, this is the result for pgbench running on a UMA
      machine with different numbers of clients.
      
                                4.15.0-rc9             4.15.0-rc9
                                  baseline              waprev-v1
       Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
       Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
       Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
       Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
       Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
       Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
       Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
       Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
       Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
       Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
       CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)
      
      Note that the different in performance is marginal but for low utilisation,
      there is less variability.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      806486c3
    • Mel Gorman's avatar
      sched/fair: Restructure wake_affine*() to return a CPU id · 3b76c4a3
      Mel Gorman authored
      This is a preparation patch that has wake_affine*() return a CPU ID instead of
      a boolean. The intent is to allow the wake_affine() helpers to be avoided
      if a decision is already made. This patch has no functional change.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-3-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3b76c4a3
    • Mel Gorman's avatar
      sched/fair: Remove unnecessary parameters from wake_affine_idle() · 89a55f56
      Mel Gorman authored
      wake_affine_idle() takes parameters it never uses so clean it up.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-2-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      89a55f56
    • Wen Yang's avatar
      sched/rt: Make update_curr_rt() more accurate · e7ad2031
      Wen Yang authored
      rq->clock_task may be updated between the two calls of
      rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only
      once makes it more accurate and efficient, taking update_curr() as
      reference.
      Signed-off-by: default avatarWen Yang <wen.yang99@zte.com.cn>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarJiang Biao <jiang.biao2@zte.com.cn>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: zhong.weidong@zte.com.cn
      Link: http://lkml.kernel.org/r/1517800721-42092-1-git-send-email-wen.yang99@zte.com.cnSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e7ad2031
    • Steven Rostedt (VMware)'s avatar
      sched/rt: Up the root domain ref count when passing it around via IPIs · 364f5665
      Steven Rostedt (VMware) authored
      When issuing an IPI RT push, where an IPI is sent to each CPU that has more
      than one RT task scheduled on it, it references the root domain's rto_mask,
      that contains all the CPUs within the root domain that has more than one RT
      task in the runable state. The problem is, after the IPIs are initiated, the
      rq->lock is released. This means that the root domain that is associated to
      the run queue could be freed while the IPIs are going around.
      
      Add a sched_get_rd() and a sched_put_rd() that will increment and decrement
      the root domain's ref count respectively. This way when initiating the IPIs,
      the scheduler will up the root domain's ref count before releasing the
      rq->lock, ensuring that the root domain does not go away until the IPI round
      is complete.
      Reported-by: default avatarPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      364f5665
    • Steven Rostedt (VMware)'s avatar
      sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() · ad0f1d9d
      Steven Rostedt (VMware) authored
      When the rto_push_irq_work_func() is called, it looks at the RT overloaded
      bitmask in the root domain via the runqueue (rq->rd). The problem is that
      during CPU up and down, nothing here stops rq->rd from changing between
      taking the rq->rd->rto_lock and releasing it. That means the lock that is
      released is not the same lock that was taken.
      
      Instead of using this_rq()->rd to get the root domain, as the irq work is
      part of the root domain, we can simply get the root domain from the irq work
      that is passed to the routine:
      
       container_of(work, struct root_domain, rto_push_work)
      
      This keeps the root domain consistent.
      Reported-by: default avatarPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ad0f1d9d
    • Peter Zijlstra's avatar
      sched/core: Optimize update_stats_*() · 2ed41a55
      Peter Zijlstra authored
      These functions are already gated by schedstats_enabled(), there is no
      point in then issuing another static_branch for every individual
      update in them.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2ed41a55
    • Peter Zijlstra's avatar
      sched/core: Optimize ttwu_stat() · b85c8b71
      Peter Zijlstra authored
      The whole of ttwu_stat() is guarded by a single schedstat_enabled(),
      there is absolutely no point in then issuing another static_branch for
      every single schedstat_inc() in there.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b85c8b71
  2. 05 Feb, 2018 11 commits
    • Mathieu Desnoyers's avatar
      membarrier/selftest: Test private expedited sync core command · 460e8c33
      Mathieu Desnoyers authored
      Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and
      MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Elder <paul.elder@pitt.edu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-12-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      460e8c33
    • Mathieu Desnoyers's avatar
      membarrier/arm64: Provide core serializing command · f1e3a12b
      Mathieu Desnoyers authored
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f1e3a12b
    • Mathieu Desnoyers's avatar
      membarrier/x86: Provide core serializing command · 10bcc80e
      Mathieu Desnoyers authored
      There are two places where core serialization is needed by membarrier:
      
      1) When returning from the membarrier IPI,
      2) After scheduler updates curr to a thread with a different mm, before
         going back to user-space, since the curr->mm is used by membarrier to
         check whether it needs to send an IPI to that CPU.
      
      x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go
      back to user-space. The IRET instruction is core serializing, but not
      SYSEXIT.
      
      x86-64 uses IRET as return from interrupt, which takes care of the IPI.
      However, it can return to user-space through either SYSRETL (compat
      code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing,
      we rely instead on write_cr3() performed by switch_mm() to provide core
      serialization after changing the current mm, and deal with the special
      case of kthread -> uthread (temporarily keeping current mm into
      active_mm) by adding a sync_core() in that specific case.
      
      Use the new sync_core_before_usermode() to guarantee this.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10bcc80e
    • Mathieu Desnoyers's avatar
      membarrier: Provide core serializing command, *_SYNC_CORE · 70216e18
      Mathieu Desnoyers authored
      Provide core serializing membarrier command to support memory reclaim
      by JIT.
      
      Each architecture needs to explicitly opt into that support by
      documenting in their architecture code how they provide the core
      serializing instructions required when returning from the membarrier
      IPI, and after the scheduler has updated the curr->mm pointer (before
      going back to user-space). They should then select
      ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on
      their architecture.
      
      Architectures selecting this feature need to either document that
      they issue core serializing instructions when returning to user-space,
      or implement their architecture-specific sync_core_before_usermode().
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      70216e18
    • Mathieu Desnoyers's avatar
      lockin/x86: Implement sync_core_before_usermode() · ac1ab12a
      Mathieu Desnoyers authored
      Ensure that a core serializing instruction is issued before returning to
      user-mode. x86 implements return to user-space through sysexit, sysrel,
      and sysretq, which are not core serializing.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-8-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ac1ab12a
    • Mathieu Desnoyers's avatar
      locking: Introduce sync_core_before_usermode() · e61938a9
      Mathieu Desnoyers authored
      Introduce an architecture function that ensures the current CPU
      issues a core serializing instruction before returning to usermode.
      
      This is needed for the membarrier "sync_core" command.
      
      Architectures defining the sync_core_before_usermode() static inline
      need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-7-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e61938a9
    • Mathieu Desnoyers's avatar
      membarrier/selftest: Test global expedited command · 92485487
      Mathieu Desnoyers authored
      Test the new MEMBARRIER_CMD_GLOBAL_EXPEDITED and
      MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED commands.
      
      Adapt to the MEMBARRIER_CMD_SHARED -> MEMBARRIER_CMD_GLOBAL rename.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Elder <paul.elder@pitt.edu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-6-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      92485487
    • Mathieu Desnoyers's avatar
      membarrier: Provide GLOBAL_EXPEDITED command · c5f58bd5
      Mathieu Desnoyers authored
      Allow expedited membarrier to be used for data shared between processes
      through shared memory.
      
      Processes wishing to receive the membarriers register with
      MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
      membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This allows extremely simple kernel-level implementation: we have almost
      everything we need with the PRIVATE_EXPEDITED barrier code. All we need
      to do is to add a flag in the mm_struct that will be used to check
      whether we need to send the IPI to the current thread of each CPU.
      
      There is a slight downside to this approach compared to targeting
      specific shared memory users: when performing a membarrier operation,
      all registered "global" receivers will get the barrier, even if they
      don't share a memory mapping with the sender issuing
      MEMBARRIER_CMD_GLOBAL_EXPEDITED.
      
      This registration approach seems to fit the requirement of not
      disturbing processes that really deeply care about real-time: they
      simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
      
      In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
      command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
      MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
      compatibility.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c5f58bd5
    • Mathieu Desnoyers's avatar
      membarrier: Document scheduler barrier requirements · 306e0604
      Mathieu Desnoyers authored
      Document the membarrier requirement on having a full memory barrier in
      __schedule() after coming from user-space, before storing to rq->curr.
      It is provided by smp_mb__after_spinlock() in __schedule().
      
      Document that membarrier requires a full barrier on transition from
      kernel thread to userspace thread. We currently have an implicit barrier
      from atomic_dec_and_test() in mmdrop() that ensures this.
      
      The x86 switch_mm_irqs_off() full barrier is currently provided by many
      cpumask update operations as well as write_cr3(). Document that
      write_cr3() provides this barrier.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-4-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      306e0604
    • Mathieu Desnoyers's avatar
      powerpc, membarrier: Skip memory barrier in switch_mm() · 3ccfebed
      Mathieu Desnoyers authored
      Allow PowerPC to skip the full memory barrier in switch_mm(), and
      only issue the barrier when scheduling into a task belonging to a
      process that has registered to use expedited private.
      
      Threads targeting the same VM but which belong to different thread
      groups is a tricky case. It has a few consequences:
      
      It turns out that we cannot rely on get_nr_threads(p) to count the
      number of threads using a VM. We can use
      (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
      instead to skip the synchronize_sched() for cases where the VM only has
      a single user, and that user only has a single thread.
      
      It also turns out that we cannot use for_each_thread() to set
      thread flags in all threads using a VM, as it only iterates on the
      thread group.
      
      Therefore, test the membarrier state variable directly rather than
      relying on thread flags. This means
      membarrier_register_private_expedited() needs to set the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
      only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
      private expedited membarrier commands to succeed.
      membarrier_arch_switch_mm() now tests for the
      MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3ccfebed
    • Mathieu Desnoyers's avatar
      membarrier/selftest: Test private expedited command · 667ca1ec
      Mathieu Desnoyers authored
      Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
      MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.
      
      Add checks expecting specific error values on system calls expected to
      fail.
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Elder <paul.elder@pitt.edu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-2-mathieu.desnoyers@efficios.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      667ca1ec
  3. 30 Jan, 2018 19 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72906f38
      Linus Torvalds authored
      Pull x86 hyperv update from Ingo Molnar:
       "Enable PCID support on Hyper-V guests"
      
      * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hyperv: Stop suppressing X86_FEATURE_PCID
      72906f38
    • Linus Torvalds's avatar
      Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3ccabd6d
      Linus Torvalds authored
      Pull x86 cleanups from Ingo Molnar:
       "Misc cleanups"
      
      * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Remove unused IOMMU_STRESS Kconfig
        x86/extable: Mark exception handler functions visible
        x86/timer: Don't inline __const_udelay
        x86/headers: Remove duplicate #includes
      3ccabd6d
    • Linus Torvalds's avatar
      Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5289d300
      Linus Torvalds authored
      Pull x86 apic cleanup from Ingo Molnar:
       "A single change simplifying the APIC code bit"
      
      * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Remove local var in flat_send_IPI_allbutself()
      5289d300
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af8c5e2d
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - Implement frequency/CPU invariance and OPP selection for
           SCHED_DEADLINE (Juri Lelli)
      
         - Tweak the task migration logic for better multi-tasking
           workload scalability (Mel Gorman)
      
         - Misc cleanups, fixes and improvements"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/deadline: Make bandwidth enforcement scale-invariant
        sched/cpufreq: Move arch_scale_{freq,cpu}_capacity() outside of #ifdef CONFIG_SMP
        sched/cpufreq: Remove arch_scale_freq_capacity()'s 'sd' parameter
        sched/cpufreq: Always consider all CPUs when deciding next freq
        sched/cpufreq: Split utilization signals
        sched/cpufreq: Change the worker kthread to SCHED_DEADLINE
        sched/deadline: Move CPU frequency selection triggering points
        sched/cpufreq: Use the DEADLINE utilization signal
        sched/deadline: Implement "runtime overrun signal" support
        sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
        sched/fair: Correct obsolete comment about cpufreq_update_util()
        sched/fair: Remove impossible condition from find_idlest_group_cpu()
        sched/cpufreq: Don't pass flags to sugov_set_iowait_boost()
        sched/cpufreq: Initialize sg_cpu->flags to 0
        sched/fair: Consider RT/IRQ pressure in capacity_spare_wake()
        sched/fair: Use 'unsigned long' for utilization, consistently
        sched/core: Rework and clarify prepare_lock_switch()
        sched/fair: Remove unused 'curr' parameter from wakeup_gran
        sched/headers: Constify object_is_on_stack()
      af8c5e2d
    • Linus Torvalds's avatar
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a1c75e17
      Linus Torvalds authored
      Pull x86 RAS updates from Ingo Molnar:
      
       - various AMD SMCA error parsing/reporting improvements (Yazen Ghannam)
      
       - extend Intel CMCI error reporting to more cases (Xie XiuQi)
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/MCE: Make correctable error detection look at the Deferred bit
        x86/MCE: Report only DRAM ECC as memory errors on AMD systems
        x86/MCE/AMD: Define a function to get SMCA bank type
        x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems
        x86/MCE: Extend table to report action optional errors through CMCI too
      a1c75e17
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d8b91dde
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Kernel side changes:
      
         - Clean up the x86 instruction decoder (Masami Hiramatsu)
      
         - Add new uprobes optimization for PUSH instructions on x86 (Yonghong
           Song)
      
         - Add MSR_IA32_THERM_STATUS to the MSR events (Stephane Eranian)
      
         - Fix misc bugs, update documentation, plus various cleanups (Jiri
           Olsa)
      
        There's a large number of tooling side improvements:
      
         - Intel-PT/BTS improvements (Adrian Hunter)
      
         - Numerous 'perf trace' improvements (Arnaldo Carvalho de Melo)
      
         - Introduce an errno code to string facility (Hendrik Brueckner)
      
         - Various build system improvements (Jiri Olsa)
      
         - Add support for CoreSight trace decoding by making the perf tools
           use the external openCSD (Mathieu Poirier, Tor Jeremiassen)
      
         - Add ARM Statistical Profiling Extensions (SPE) support (Kim
           Phillips)
      
         - libtraceevent updates (Steven Rostedt)
      
         - Intel vendor event JSON updates (Andi Kleen)
      
         - Introduce 'perf report --mmaps' and 'perf report --tasks' to show
           info present in 'perf.data' (Jiri Olsa, Arnaldo Carvalho de Melo)
      
         - Add infrastructure to record first and last sample time to the
           perf.data file header, so that when processing all samples in a
           'perf record' session, such as when doing build-id processing, or
           when specifically requesting that that info be recorded, use that
           in 'perf report --time', that also got support for percent slices
           in addition to absolute ones.
      
           I.e. now it is possible to ask for the samples in the 10%-20% time
           slice of a perf.data file (Jin Yao)
      
         - Allow system wide 'perf stat --per-thread', sorting the result (Jin
           Yao)
      
           E.g.:
      
            [root@jouet ~]# perf stat --per-thread --metrics IPC
            ^C
             Performance counter stats for 'system wide':
      
                        make-22229  23,012,094,032  inst_retired.any   #  0.8 IPC
                         cc1-22419     692,027,497  inst_retired.any   #  0.8 IPC
                         gcc-22418     328,231,855  inst_retired.any   #  0.9 IPC
                         cc1-22509     220,853,647  inst_retired.any   #  0.8 IPC
                         gcc-22486     199,874,810  inst_retired.any   #  1.0 IPC
                          as-22466     177,896,365  inst_retired.any   #  0.9 IPC
                         cc1-22465     150,732,374  inst_retired.any   #  0.8 IPC
                         gcc-22508     112,555,593  inst_retired.any   #  0.9 IPC
                         cc1-22487     108,964,079  inst_retired.any   #  0.7 IPC
             qemu-system-x86-2697       21,330,550  inst_retired.any   #  0.3 IPC
             systemd-journal-551        20,642,951  inst_retired.any   #  0.4 IPC
             docker-containe-17651       9,552,892  inst_retired.any   #  0.5 IPC
             dockerd-current-9809        7,528,586  inst_retired.any   #  0.5 IPC
                        make-22153  12,504,194,380  inst_retired.any   #  0.8 IPC
                     python2-22429  12,081,290,954  inst_retired.any   #  0.8 IPC
            <SNIP>
                     python2-22429  15,026,328,103  cpu_clk_unhalted.thread
                         cc1-22419     826,660,193  cpu_clk_unhalted.thread
                         gcc-22418     365,321,295  cpu_clk_unhalted.thread
                         cc1-22509     279,169,362  cpu_clk_unhalted.thread
                         gcc-22486     210,156,950  cpu_clk_unhalted.thread
            <SNIP>
      
                 5.638075538 seconds time elapsed
      
           [root@jouet ~]#
      
         - Improve shell auto-completion of perf events (Jin Yao)
      
         - 'perf probe' improvements (Masami Hiramatsu)
      
         - Improve PMU infrastructure to support amp64's ThunderX2
           implementation defined core events (Ganapatrao Kulkarni)
      
         - Various annotation related improvements and fixes (Thomas Richter)
      
         - Clarify usage of 'overwrite' and 'backward' in the evlist/mmap
           code, removing the 'overwrite' parameter from several functions as
           it was always used it as 'false' (Wang Nan)
      
         - Fix/improve 'perf record' reverse recording support (Wang Nan)
      
         - Improve command line options documentation (Sihyeon Jang)
      
         - Optimize sample parsing for ordering events, where we don't need to
           parse all the PERF_SAMPLE_ bits, just the ones leading to the
           timestamp needed to reorder events (Jiri Olsa)
      
         - Generalize the annotation code to support other source information
           besides objdump/DWARF obtained ones, starting with python scripts,
           that will is slated to be merged soon (Jiri Olsa)
      
         - ... and a lot more that I failed to list, see the shortlog and
           changelog for details"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
        perf trace beauty flock: Move to separate object file
        perf evlist: Remove fcntl.h from evlist.h
        perf trace beauty futex: Beautify FUTEX_BITSET_MATCH_ANY
        perf trace: Do not print from time delta for interrupted syscall lines
        perf trace: Add --print-sample
        perf bpf: Remove misplaced __maybe_unused attribute
        MAINTAINERS: Adding entry for CoreSight trace decoding
        perf tools: Add mechanic to synthesise CoreSight trace packets
        perf tools: Add full support for CoreSight trace decoding
        pert tools: Add queue management functionality
        perf tools: Add functionality to communicate with the openCSD decoder
        perf tools: Add support for decoding CoreSight trace data
        perf tools: Add decoder mechanic to support dumping trace data
        perf tools: Add processing of coresight metadata
        perf tools: Add initial entry point for decoder CoreSight traces
        perf tools: Integrating the CoreSight decoding library
        perf vendor events intel: Update IvyTown files to V20
        perf vendor events intel: Update IvyBridge files to V20
        perf vendor events intel: Update BroadwellDE events to V7
        perf vendor events intel: Update SkylakeX events to V1.06
        ...
      d8b91dde
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e7481a2
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "The main changes relate to making lock_is_held() et al (and external
        wrappers of them) work on const data types - this requires const
        propagation through the depths of lockdep.
      
        This removes a number of ugly type hacks the external helpers used"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Convert some users to const
        lockdep: Make lockdep checking constant
        lockdep: Assign lock keys on registration
      5e7481a2
    • Linus Torvalds's avatar
      Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b8dbf730
      Linus Torvalds authored
      Pull EFI updates from Ingo Molnar:
       "The biggest change in this cycle was the addition of ARM CPER error
        decoding when printing EFI errors into the kernel log.
      
        There are also misc smaller updates: documentation update, cleanups
        and an EFI memory map permissions quirk"
      
      * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efi: Clarify that reset attack mitigation needs appropriate userspace
        efi: Parse ARM error information value
        efi: Move ARM CPER code to new file
        efi: Use PTR_ERR_OR_ZERO()
        arm64/efi: Ignore EFI_MEMORY_XP attribute if RP and/or WP are set
        efi/capsule-loader: Fix pr_err() string to end with newline
      b8dbf730
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d7727946
      Linus Torvalds authored
      Pull RCU updates from Ingo Molnar:
       "The main RCU changes in this cycle were:
      
         - Updates to use cond_resched() instead of cond_resched_rcu_qs()
           where feasible (currently everywhere except in kernel/rcu and in
           kernel/torture.c). Also a couple of fixes to avoid sending IPIs to
           offline CPUs.
      
         - Updates to simplify RCU's dyntick-idle handling.
      
         - Updates to remove almost all uses of smp_read_barrier_depends() and
           read_barrier_depends().
      
         - Torture-test updates.
      
         - Miscellaneous fixes"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
        torture: Save a line in stutter_wait(): while -> for
        torture: Eliminate torture_runnable and perf_runnable
        torture: Make stutter less vulnerable to compilers and races
        locking/locktorture: Fix num reader/writer corner cases
        locking/locktorture: Fix rwsem reader_delay
        torture: Place all torture-test modules in one MAINTAINERS group
        rcutorture/kvm-build.sh: Skip build directory check
        rcutorture: Simplify functions.sh include path
        rcutorture: Simplify logging
        rcutorture/kvm-recheck-*: Improve result directory readability check
        rcutorture/kvm.sh: Support execution from any directory
        rcutorture/kvm.sh: Use consistent help text for --qemu-args
        rcutorture/kvm.sh: Remove unused variable, `alldone`
        rcutorture: Remove unused script, config2frag.sh
        rcutorture/configinit: Fix build directory error message
        rcutorture: Preempt RCU-preempt readers more vigorously
        torture: Reduce #ifdefs for preempt_schedule()
        rcu: Remove have_rcu_nocb_mask from tree_plugin.h
        rcu: Add comment giving debug strategy for double call_rcu()
        tracing, rcu: Hide trace event rcu_nocb_wake when not used
        ...
      d7727946
    • Linus Torvalds's avatar
      Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c1488798
      Linus Torvalds authored
      Pull STRICT_DEVMEM default from Ingo Molnar:
       "Make CONFIG_STRICT_DEVMEM default-y on x86 and arm64 as well, to
        follow the distro status quo"
      
      * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Kconfig: Make STRICT_DEVMEM default-y on x86 and arm64
      c1488798
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6304672b
      Linus Torvalds authored
      Pull x86/pti updates from Thomas Gleixner:
       "Another set of melted spectrum related changes:
      
         - Code simplifications and cleanups for RSB and retpolines.
      
         - Make the indirect calls in KVM speculation safe.
      
         - Whitelist CPUs which are known not to speculate from Meltdown and
           prepare for the new CPUID flag which tells the kernel that a CPU is
           not affected.
      
         - A less rigorous variant of the module retpoline check which merily
           warns when a non-retpoline protected module is loaded and reflects
           that fact in the sysfs file.
      
         - Prepare for Indirect Branch Prediction Barrier support.
      
         - Prepare for exposure of the Speculation Control MSRs to guests, so
           guest OSes which depend on those "features" can use them. Includes
           a blacklist of the broken microcodes. The actual exposure of the
           MSRs through KVM is still being worked on"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/speculation: Simplify indirect_branch_prediction_barrier()
        x86/retpoline: Simplify vmexit_fill_RSB()
        x86/cpufeatures: Clean up Spectre v2 related CPUID flags
        x86/cpu/bugs: Make retpoline module warning conditional
        x86/bugs: Drop one "mitigation" from dmesg
        x86/nospec: Fix header guards names
        x86/alternative: Print unadorned pointers
        x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
        x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
        x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
        x86/msr: Add definitions for new speculation control MSRs
        x86/cpufeatures: Add AMD feature bits for Speculation Control
        x86/cpufeatures: Add Intel feature bits for Speculation Control
        x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
        module/retpoline: Warn about missing retpoline in module
        KVM: VMX: Make indirect call speculation safe
        KVM: x86: Make indirect calls in emulator speculation safe
      6304672b
    • Linus Torvalds's avatar
      Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 94263352
      Linus Torvalds authored
      Pull x86 mm update from Thomas Gleixner:
       "A single patch which excludes the GART aperture from vmcore as
        accessing that area from a dump kernel can crash the kernel.
      
        Not necessarily the nicest way to fix this, but curing this from
        ground up requires a more thorough rewrite of the whole kexec/kdump
        magic"
      
      * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/gart: Exclude GART aperture from vmcore
      94263352
    • Linus Torvalds's avatar
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 36c289e7
      Linus Torvalds authored
      Pull x86 timer updates from Thomas Gleixner:
       "A small set of updates for x86 specific timers:
      
         - Mark TSC invariant on a subset of Centaur CPUs
      
         - Allow TSC calibration without PIT on mobile platforms which lack
           legacy devices"
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/centaur: Mark TSC invariant
        x86/tsc: Introduce early tsc clocksource
        x86/time: Unconditionally register legacy timer interrupt
        x86/tsc: Allow TSC calibration without PIT
      36c289e7
    • Linus Torvalds's avatar
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 669c0f76
      Linus Torvalds authored
      Pull x86 platform updates from Thomas Gleixner:
       "The platform support for x86 contains the following updates:
      
         - A set of updates for the UV platform to support new CPUs and to fix
           some of the UV4A BAU MRRs
      
         - The initial platform support for the jailhouse hypervisor to allow
           native Linux guests (inmates) in non-root cells.
      
         - A fix for the PCI initialization on Intel MID platforms"
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
        x86/jailhouse: Respect pci=lastbus command line settings
        x86/jailhouse: Set X86_FEATURE_TSC_KNOWN_FREQ
        x86/platform/intel-mid: Move PCI initialization to arch_init()
        x86/platform/uv/BAU: Replace hard-coded values with MMR definitions
        x86/platform/UV: Fix UV4A BAU MMRs
        x86/platform/UV: Fix GAM MMR references in the UV x2apic code
        x86/platform/UV: Fix GAM MMR changes in UV4A
        x86/platform/UV: Add references to access fixed UV4A HUB MMRs
        x86/platform/UV: Fix UV4A support on new Intel Processors
        x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes
        x86/jailhouse: Add PCI dependency
        x86/jailhouse: Hide x2apic code when CONFIG_X86_X2APIC=n
        x86/jailhouse: Initialize PCI support
        x86/jailhouse: Wire up IOAPIC for legacy UART ports
        x86/jailhouse: Halt instead of failing to restart
        x86/jailhouse: Silence ACPI warning
        x86/jailhouse: Avoid access of unsupported platform resources
        x86/jailhouse: Set up timekeeping
        x86/jailhouse: Enable PMTIMER
        x86/jailhouse: Enable APIC and SMP support
        ...
      669c0f76
    • Linus Torvalds's avatar
      Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f0b13428
      Linus Torvalds authored
      Pull x86/cache updates from Thomas Gleixner:
       "A set of patches which add support for L2 cache partitioning to the
        Intel RDT facility"
      
      * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/intel_rdt: Add command line parameter to control L2_CDP
        x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
        x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
        x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
        x86/intel_rdt: Add L2CDP support in documentation
        x86/intel_rdt: Update documentation
      f0b13428
    • Linus Torvalds's avatar
      Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a46d3f9b
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "The timer departement presents:
      
         - A rather large rework of the hrtimer infrastructure which
           introduces softirq based hrtimers to replace the spread of
           hrtimer/tasklet combos which force the actual callback execution
           into softirq context. The approach is completely different from the
           initial implementation which you cursed at 10 years ago rightfully.
      
           The softirq based timers have their own queues and there is no
           nasty indirection and list reshuffling in the hard interrupt
           anymore. This comes with conversion of some of the hrtimer/tasklet
           users, the rest and the final removal of that horrible interface
           will come towards the end of the merge window or go through the
           relevant maintainer trees.
      
           Note: The top commit merged the last minute bugfix for the 10 years
           old CPU hotplug bug as I wanted to make sure that I fatfinger the
           merge conflict resolution myself.
      
         - The overhaul of the STM32 clocksource/clockevents driver
      
         - A new driver for the Spreadtrum SC9860 timer
      
         - A new driver dor the Actions Semi S700 timer
      
         - The usual set of fixes and updates all over the place"
      
      * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
        usb/gadget/NCM: Replace tasklet with softirq hrtimer
        ALSA/dummy: Replace tasklet with softirq hrtimer
        hrtimer: Implement SOFT/HARD clock base selection
        hrtimer: Implement support for softirq based hrtimers
        hrtimer: Prepare handling of hard and softirq based hrtimers
        hrtimer: Add clock bases and hrtimer mode for softirq context
        hrtimer: Use irqsave/irqrestore around __run_hrtimer()
        hrtimer: Factor out __hrtimer_next_event_base()
        hrtimer: Factor out __hrtimer_start_range_ns()
        hrtimer: Remove the 'base' parameter from hrtimer_reprogram()
        hrtimer: Make remote enqueue decision less restrictive
        hrtimer: Unify remote enqueue handling
        hrtimer: Unify hrtimer removal handling
        hrtimer: Make hrtimer_force_reprogramm() unconditionally available
        hrtimer: Make hrtimer_reprogramm() unconditional
        hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional
        hrtimer: Make the remote enqueue check unconditional
        hrtimer: Use accesor functions instead of direct access
        hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to simplify the code
        hrtimer: Make room in 'struct hrtimer_cpu_base'
        ...
      a46d3f9b
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7bcd3425
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "A rather small set of irq updates this time:
      
         - removal of the old and now obsolete irq domain debugging code
      
         - the new Goldfish PIC driver
      
         - the usual pile of small fixes and updates"
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG
        irq/work: Improve the flag definitions
        irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry
        irqchip/irq-goldfish-pic: Add Goldfish PIC driver
        dt-bindings/goldfish-pic: Add device tree binding for Goldfish PIC driver
        irqchip/ompic: fix return value check in ompic_of_init()
        dt-bindings/bcm283x: Define polarity of per-cpu interrupts
        irqchip/irq-bcm2836: Add support for DT interrupt polarity
        dt-bindings/bcm2836-l1-intc: Add interrupt polarity support
      7bcd3425
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa · d0bd31dc
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - add SSP support
      
       - add KASAN support
      
       - improvements to xtensa-specific assembly:
          - use ENTRY and ENDPROC consistently
          - clean up and unify word alignment macros
          - clean up and unify fixup marking
          - use 'call' instead of 'callx' where possible
      
       - various cleanups:
          - consiolidate kernel stack size related definitions
          - replace #ifdef'fed/commented out debug printk statements with
            pr_debug
          - use struct exc_table instead of flat array for exception handling
            data
      
       - build kernel with -mtext-section-literals; simplify xtensa linker
         script
      
       - fix futex_atomic_cmpxchg_inatomic()
      
      * tag 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa: (21 commits)
        xtensa: fix futex_atomic_cmpxchg_inatomic
        xtensa: shut up gcc-8 warnings
        xtensa: print kernel sections info in mem_init
        xtensa: use generic strncpy_from_user with KASAN
        xtensa: use __memset in __xtensa_clear_user
        xtensa: add support for KASAN
        xtensa: move fixmap and kmap just above the KSEG
        xtensa: don't clear swapper_pg_dir in paging_init
        xtensa: extract init_kio
        xtensa: implement early_trap_init
        xtensa: clean up exception handling structure
        xtensa: clean up custom-controlled debug output
        xtensa: enable stack protector
        xtensa: print hardware config ID on startup
        xtensa: consolidate kernel stack size related definitions
        xtensa: clean up functions in assembly code
        xtensa: clean up word alignment macros in assembly code
        xtensa: clean up fixups in assembly code
        xtensa: use call instead of callx in assembly code
        xtensa: build kernel with text-section-literals
        ...
      d0bd31dc
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v4.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · aca21de2
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
      
        - first part of an overhaul of the NuBus subsystem, to bring it up to
          modern driver model standards
      
        - a race condition fix for Mac
      
        - defconfig updates
      
      * tag 'm68k-for-v4.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        MAINTAINERS: Add NuBus subsystem entry
        m68k/mac: Fix race conditions in OSS interrupt dispatch
        nubus: Add support for the driver model
        nubus: Add expansion_type values for various Mac models
        nubus: Adopt standard linked list implementation
        nubus: Rename struct nubus_dev
        nubus: Rework /proc/bus/nubus/s/ implementation
        nubus: Generalize block resource handling
        nubus: Clean up whitespace
        nubus: Remove redundant code
        nubus: Call proc_mkdir() not more than once per slot directory
        nubus: Validate slot resource IDs
        nubus: Fix log spam
        nubus: Use static functions where possible
        nubus: Fix up header split
        nubus: Avoid array underflow and overflow
        m68k/defconfig: Update defconfigs for v4.15-rc1
      aca21de2
  4. 29 Jan, 2018 1 commit
    • Linus Torvalds's avatar
      Merge tag 'for-4.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 31466f3e
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Features or user visible changes:
      
         - fallocate: implement zero range mode
      
         - avoid losing data raid profile when deleting a device
      
         - tree item checker: more checks for directory items and xattrs
      
        Notable fixes:
      
         - raid56 recovery: don't use cached stripes, that could be
           potentially changed and a later RMW or recovery would lead to
           corruptions or failures
      
         - let raid56 try harder to rebuild damaged data, reading from all
           stripes if necessary
      
         - fix scrub to repair raid56 in a similar way as in the case above
      
        Other:
      
         - cleanups: device freeing, removed some call indirections, redundant
           bio_put/_get, unused parameters, refactorings and renames
      
         - RCU list traversal fixups
      
         - simplify mount callchain, remove recursing back when mounting a
           subvolume
      
         - plug for fsync, may improve bio merging on multiple devices
      
         - compression heurisic: replace heap sort with radix sort, gains some
           performance
      
         - add extent map selftests, buffered write vs dio"
      
      * tag 'for-4.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (155 commits)
        btrfs: drop devid as device_list_add() arg
        btrfs: get device pointer from device_list_add()
        btrfs: set the total_devices in device_list_add()
        btrfs: move pr_info into device_list_add
        btrfs: make btrfs_free_stale_devices() to match the path
        btrfs: rename btrfs_free_stale_devices() arg to skip_dev
        btrfs: make btrfs_free_stale_devices() argument optional
        btrfs: make btrfs_free_stale_device() to iterate all stales
        btrfs: no need to check for btrfs_fs_devices::seeding
        btrfs: Use IS_ALIGNED in btrfs_truncate_block instead of opencoding it
        Btrfs: noinline merge_extent_mapping
        Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping
        Btrfs: extent map selftest: dio write vs dio read
        Btrfs: extent map selftest: buffered write vs dio read
        Btrfs: add extent map selftests
        Btrfs: move extent map specific code to extent_map.c
        Btrfs: add helper for em merge logic
        Btrfs: fix unexpected EEXIST from btrfs_get_extent
        Btrfs: fix incorrect block_len in merge_extent_mapping
        btrfs: Remove unused readahead spinlock
        ...
      31466f3e