1. 18 Aug, 2016 12 commits
    • Stanislaw Gruszka's avatar
      sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
      Stanislaw Gruszka authored
      Commit:
      
        d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
      
      started accounting thread group tasks pending runtime in thread_group_cputime().
      
      Another commit:
      
        6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
      
      updated scheduler runtime statistics (call update_curr()) when reading task pending
      runtime. Those changes cause bad performance of SYS_times() and
      SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
      larger systems with many CPUs.
      
      While we would like to have cpuclock monotonicity kept i.e. have
      problems fixed by above commits stay fixed, we also would like to have
      good performance.
      
      However when we notice that change from commit d670ec13 is not
      longer needed to solve problem addressed by that commit, because of
      change from the second commit 6e998916, we can get room for
      optimization. Since we update task while reading it's pending runtime
      in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
      see updated values and on testcase from d670ec13 process cpuclock
      will not be smaller than thread cpuclock.
      
      I tested the patch on testcases from commits d670ec13,
      6e998916 and some other cpuclock/cputimers testcases and
      did not found cpuclock monotonicity problems or other malfunction.
      
      This patch has the drawback that we will not provide thread group cputime
      up-to-date to the last moment. For example when arming cputime timer,
      we will arm it with possibly a bit outdated values and that timer will
      trigger earlier compared to behaviour without the patch. However that
      was the behaviour before d670ec13 commit (kernel v3.1) so it's
      unlikely to affect applications.
      
      Patch improves related syscall performance, as measured by Giovanni's
      benchmarks described in commit:
      
        6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
      
      The benchmark results are:
      
      SYS_clock_gettime():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
        5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
        8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
        12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
        21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
        30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
        48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
        79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
        110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
        128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
      
      SYS_times():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
        5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
        8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
        12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
        21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
        30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
        48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
        79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
        110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
        128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
      Reported-and-tested-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a1eb1411
    • Morten Rasmussen's avatar
      sched/fair: Let asymmetric CPU configurations balance at wake-up · 3273163c
      Morten Rasmussen authored
      Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
      SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
      configurations SD_WAKE_AFFINE is only desirable if the waking task's
      compute demand (utilization) is suitable for the waking CPU and the
      previous CPU, and all CPUs within their respective
      SD_SHARE_PKG_RESOURCES domains (sd_llc). If not, let wakeup balancing
      take over (find_idlest_{group, cpu}()).
      
      This patch makes affine wake-ups conditional on whether both the waker
      CPU and the previous CPU has sufficient capacity for the waking task,
      or not, assuming that the CPU capacities within an SD_SHARE_PKG_RESOURCES
      domain (sd_llc) are homogeneous.
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-10-git-send-email-morten.rasmussen@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3273163c
    • Dietmar Eggemann's avatar
      sched/core: Store maximum per-CPU capacity in root domain · cd92bfd3
      Dietmar Eggemann authored
      To be able to compare the capacity of the target CPU with the highest
      available CPU capacity, store the maximum per-CPU capacity in the root
      domain.
      
      The max per-CPU capacity should be 1024 for all systems except SMT,
      where the capacity is currently based on smt_gain and the number of
      hardware threads and is <1024. If SMT can be brought to work with a
      per-thread capacity of 1024, this patch can be dropped and replaced by a
      hard-coded max capacity of 1024 (=SCHED_CAPACITY_SCALE).
      Signed-off-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/26c69258-9947-f830-a53e-0c54e7750646@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd92bfd3
    • Morten Rasmussen's avatar
      sched/core: Enable SD_BALANCE_WAKE for asymmetric capacity systems · 9ee1cda5
      Morten Rasmussen authored
      A domain with the SD_ASYM_CPUCAPACITY flag set indicate that
      sched_groups at this level and below do not include CPUs of all
      capacities available (e.g. group containing little-only or big-only CPUs
      in big.LITTLE systems). It is therefore necessary to put in more effort
      in finding an appropriate CPU at task wake-up by enabling balancing at
      wake-up (SD_BALANCE_WAKE) on all lower (child) levels.
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-8-git-send-email-morten.rasmussen@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ee1cda5
    • Morten Rasmussen's avatar
      sched/core: Pass child domain into sd_init() · 3676b13e
      Morten Rasmussen authored
      If behavioural sched_domain flags depend on topology flags set at higher
      domain levels we need a way to update the child domain flags. Moving the
      child pointer assignment inside sd_init() should make that possible.
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-7-git-send-email-morten.rasmussen@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3676b13e
    • Morten Rasmussen's avatar
      sched/core: Introduce SD_ASYM_CPUCAPACITY sched_domain topology flag · 1f6e6c7c
      Morten Rasmussen authored
      Add a topology flag to the sched_domain hierarchy indicating the lowest
      domain level where the full range of CPU capacities is represented by
      the domain members for asymmetric capacity topologies (e.g. ARM
      big.LITTLE).
      
      The flag is intended to indicate that extra care should be taken when
      placing tasks on CPUs and this level spans all the different types of
      CPUs found in the system (no need to look further up the domain
      hierarchy). This information is currently only available through
      iterating through the capacities of all the CPUs at parent levels in the
      sched_domain hierarchy.
      
        SD 2      [  0      1      2      3]  SD_ASYM_CPUCAPACITY
      
        SD 1      [  0      1] [   2      3]  !SD_ASYM_CPUCAPACITY
      
        CPU:         0      1      2      3
        capacity:  756    756   1024   1024
      
      If the topology in the example above is duplicated to create an eight
      CPU example with third sched_domain level on top (SD 3), this level
      should not have the flag set (!SD_ASYM_CPUCAPACITY) as its two group
      would both have all CPU capacities represented within them.
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-6-git-send-email-morten.rasmussen@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1f6e6c7c
    • Morten Rasmussen's avatar
      sched/core: Remove unnecessary NULL-pointer check · 0e6d2a67
      Morten Rasmussen authored
      Checking if the sched_domain pointer returned by sd_init() is NULL seems
      pointless as sd_init() neither checks if it is valid to begin with nor
      set it to NULL.
      Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-5-git-send-email-morten.rasmussen@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0e6d2a67
    • Peter Zijlstra's avatar
      sched/core: Clarify SD_flags comment · 94f438c8
      Peter Zijlstra authored
      The SD_flags comment is very terse and doesn't explain why PACKING is
      odd.
      
      IIRC the distinction is that the 'normal' ones only describe topology,
      while the ASYM_PACKING one also prescribes behaviour. It is odd in the
      way that it doesn't only describe things.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/20160815105459.GS6879@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      94f438c8
    • Ingo Molnar's avatar
    • Wanpeng Li's avatar
      sched/cputime: Resync steal time when guest & host lose sync · 03cbc732
      Wanpeng Li authored
      Commit:
      
        57430218 ("sched/cputime: Count actually elapsed irq & softirq time")
      
      ... fixed a bug but also triggered a regression:
      
      On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
      CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
      on host one by one until there is only one left, then observe CPU utilization
      via 'top' in the guest, it shows:
      
        100% st for cpu0(housekeeping)
         75% st for other CPUs (nohz full mode)
      
      However, w/o this commit it shows the correct 75% for all four CPUs.
      
      When a guest is interrupted for a longer amount of time, missed clock ticks
      are not redelivered later. Because of that, we should not limit the amount
      of steal time accounted to the amount of time that the calling functions
      think have passed.
      
      However, the interval returned by account_other_time() is NOT rounded down
      to the nearest jiffy, while the base interval in get_vtime_delta() it is
      subtracted from is, so the max cputime limit is required to avoid underflow.
      
      This patch fixes the regression by limiting the account_other_time() from
      get_vtime_delta() to avoid underflow, and lets the other three call sites
      (in account_other_time() and steal_account_process_time()) account however
      much steal time the host told us elapsed.
      Suggested-by: default avatarRik van Riel <riel@redhat.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
      [ Improved the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      03cbc732
    • Rik van Riel's avatar
      sched: Remove struct rq::nohz_stamp · 1fc770d5
      Rik van Riel authored
      The nohz_stamp member of struct rq has been unused since 2010,
      when this commit removed the code that referenced it:
      
        396e894d ("sched: Revert nohz_ratelimit() for now")
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160815121410.5ea1c98f@annuminas.surriel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1fc770d5
    • Peter Zijlstra's avatar
      sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression · 173be9a1
      Peter Zijlstra authored
      Mike reports:
      
       Roughly 10% of the time, ltp testcase getrusage04 fails:
       getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
       getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
       getrusage04    0  TINFO  :  utime:           0us; stime:         179us
       getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
       getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
      
      And tracked it down to the case where the task simply doesn't get
      _any_ [us]time ticks.
      
      Update the code to assume all rtime is utime when we lack information,
      thus ensuring a task that elides the tick gets time accounted.
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: 9d7fb042 ("sched/cputime: Guarantee stime + utime == rtime")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      173be9a1
  2. 15 Aug, 2016 5 commits
  3. 14 Aug, 2016 2 commits
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus-4.8' of... · 118253a5
      Linus Torvalds authored
      Merge tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull h8300 and unicore32 architecture fixes from Guenter Roeck:
       "Two patches to fix h8300 and unicore32 builds.
      
        unicore32 builds have been broken since v4.6.  The fix has been
        available in -next since March of this year.
      
        h8300 builds have been broken since the last commit window.  The fix
        has been available in -next since June of this year"
      
      * tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        h8300: Add missing include file to asm/io.h
        unicore32: mm: Add missing parameter to arch_vma_access_permitted
      118253a5
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 120c5475
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - support for nr_cpus= command line argument (maxcpus was previously
         changed to allow secondary CPUs to be hot-plugged)
      
       - ARM PMU interrupt handling fix
      
       - fix potential TLB conflict in the hibernate code
      
       - improved handling of EL1 instruction aborts (better error reporting)
      
       - removal of useless jprobes code for stack saving/restoring
      
       - defconfig updates
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO
        arm64: defconfig: add options for virtualization and containers
        arm64: hibernate: handle allocation failures
        arm64: hibernate: avoid potential TLB conflict
        arm64: Handle el1 synchronous instruction aborts cleanly
        arm64: Remove stack duplicating code from jprobes
        drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property
        drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock
        arm64: Support hard limit of cpu count by nr_cpus
      120c5475
  4. 13 Aug, 2016 4 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 329f4152
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "KVM:
         - lock kvm_device list to prevent corruption on device creation.
      
        PPC:
         - split debugfs initialization from creation of the xics device to
           unlock the newly taken kvm lock earlier.
      
        s390:
         - prevent userspace from triggering two WARN_ON_ONCE.
      
        MIPS:
         - fix several issues in the management of TLB faults (Cc: stable)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        MIPS: KVM: Propagate kseg0/mapped tlb fault errors
        MIPS: KVM: Fix gfn range check in kseg0 tlb faults
        MIPS: KVM: Add missing gfn range check
        MIPS: KVM: Fix mapped fault broken commpage handling
        KVM: Protect device ops->create and list_add with kvm->lock
        KVM: PPC: Move xics_debugfs_init out of create
        KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
        KVM: s390: set the prefix initially properly
      329f4152
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · a1e21033
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - an NVMe fix from Gabriel, fixing a suspend/resume issue on some
         setups
      
       - addition of a few missing entries in the block queue sysfs
         documentation, from Joe
      
       - a fix for a sparse shadow warning for the bvec iterator, from
         Johannes
      
       - a writeback deadlock involving raid issuing barriers, and not
         flushing the plug when we wakeup the flusher threads.  From
         Konstantin
      
       - a set of patches for the NVMe target/loop/rdma code, from Roland and
         Sagi
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        bvec: avoid variable shadowing warning
        doc: update block/queue-sysfs.txt entries
        nvme: Suspend all queues before deletion
        mm, writeback: flush plugged IO in wakeup_flusher_threads()
        nvme-rdma: Remove unused includes
        nvme-rdma: start async event handler after reconnecting to a controller
        nvmet: Fix controller serial number inconsistency
        nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
        nvmet-rdma: Correctly handle RDMA device hot removal
        nvme-rdma: Make sure to shutdown the controller if we can
        nvme-loop: Remove duplicate call to nvme_remove_namespaces
        nvme-rdma: Free the I/O tags when we delete the controller
        nvme-rdma: Remove duplicate call to nvme_remove_namespaces
        nvme-rdma: Fix device removal handling
        nvme-rdma: Queue ns scanning after a sucessful reconnection
        nvme-rdma: Don't leak uninitialized memory in connect request private data
      a1e21033
    • Guenter Roeck's avatar
      h8300: Add missing include file to asm/io.h · 2b05980d
      Guenter Roeck authored
      h8300 builds fail with
      
      arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
      arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
      arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’
      
      and many related errors.
      
      Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix")
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      2b05980d
    • Guenter Roeck's avatar
      unicore32: mm: Add missing parameter to arch_vma_access_permitted · 783011b1
      Guenter Roeck authored
      unicore32 fails to compile with the following errors.
      
      mm/memory.c: In function ‘__handle_mm_fault’:
      mm/memory.c:3381: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      mm/gup.c: In function ‘check_vma_flags’:
      mm/gup.c:456: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      mm/gup.c: In function ‘vma_permits_fault’:
      mm/gup.c:640: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      
      Fixes: d61172b4 ("mm/core, x86/mm/pkeys: Differentiate instruction fetches")
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      783011b1
  5. 12 Aug, 2016 17 commits
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio · f31494bd
      Linus Torvalds authored
      Pull VFIO fix from Alex Williamson:
       "Fix oops when dereferencing empty data (Alex Williamson)"
      
      * tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio:
        vfio/pci: Fix NULL pointer oops in error interrupt setup handling
      f31494bd
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux · b112324c
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for
        races in the LOCK code which appear to go back to the big nfsd state
        lock removal from 3.17"
      
      * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux:
        nfsd: don't return an unhashed lock stateid after taking mutex
        nfsd: Fix race between FREE_STATEID and LOCK
        nfsd: fix dentry refcounting on create
      b112324c
    • Linus Torvalds's avatar
      Merge tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9710cb66
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Two hibernation fixes allowing it to work with the recently added
        randomization of the kernel identity mapping base on x86-64 and one
        cpufreq driver regression fix.
      
        Specifics:
      
         - Fix the x86 identity mapping creation helpers to avoid the
           assumption that the base address of the mapping will always be
           aligned at the PGD level, as it may be aligned at the PUD level if
           address space randomization is enabled (Rafael Wysocki).
      
         - Fix the hibernation core to avoid executing tracing functions
           before restoring the processor state completely during resume
           (Thomas Garnier).
      
         - Fix a recently introduced regression in the powernv cpufreq driver
           that causes it to crash due to an out-of-bounds array access
           (Akshay Adiga)"
      
      * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / hibernate: Restore processor state before using per-CPU variables
        x86/power/64: Always create temporary identity mapping correctly
        cpufreq: powernv: Fix crash in gpstate_timer_handler()
      9710cb66
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01ea4439
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "This is bigger than usual - the reason is partly a pent-up stream of
        fixes after the merge window and partly accidental.  The fixes are:
      
         - five patches to fix a boot failure on Andy Lutomirsky's laptop
         - four SGI UV platform fixes
         - KASAN fix
         - warning fix
         - documentation update
         - swap entry definition fix
         - pkeys fix
         - irq stats fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
        x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
        x86/boot: Rework reserve_real_mode() to allow multiple tries
        x86/boot: Defer setup_real_mode() to early_initcall time
        x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
        x86/boot: Run reserve_bios_regions() after we initialize the memory map
        x86/irq: Do not substract irq_tlb_count from irq_call_count
        x86/mm: Fix swap entry comment and macro
        x86/mm/kaslr: Fix -Wformat-security warning
        x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
        x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
        x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
        x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
        x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
        x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
        x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
        x86/mm: Disable preemption during CR3 read+write
        x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
        x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
        x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
      01ea4439
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3bc6d8c1
      Linus Torvalds authored
      Pull timer fixes from Ingo Molnar:
       "Misc fixes: a /dev/rtc regression fix, two APIC timer period
        calibration fixes, an ARM clocksource driver fix and a NOHZ
        power use regression fix"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
        x86/timers/apic: Inform TSC deadline clockevent device about recalibration
        x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
        timers: Fix get_next_timer_interrupt() computation
        clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
      3bc6d8c1
    • Rafael J. Wysocki's avatar
      Merge branches 'pm-sleep' and 'pm-cpufreq' · 0aeeb3e7
      Rafael J. Wysocki authored
      * pm-sleep:
        PM / hibernate: Restore processor state before using per-CPU variables
        x86/power/64: Always create temporary identity mapping correctly
      
      * pm-cpufreq:
        cpufreq: powernv: Fix crash in gpstate_timer_handler()
      0aeeb3e7
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e6e7214f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Misc fixes: cputime fixes, two deadline scheduler fixes and a cgroups
        scheduling fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/cputime: Fix omitted ticks passed in parameter
        sched/cputime: Fix steal time accounting
        sched/deadline: Fix lock pinning warning during CPU hotplug
        sched/cputime: Mitigate performance regression in times()/clock_gettime()
        sched/fair: Fix typo in sync_throttle()
        sched/deadline: Fix wrap-around in DL heap
      e6e7214f
    • Thomas Garnier's avatar
      PM / hibernate: Restore processor state before using per-CPU variables · 62822e2e
      Thomas Garnier authored
      Restore the processor state before calling any other functions to
      ensure per-CPU variables can be used with KASLR memory randomization.
      
      Tracing functions use per-CPU variables (GS based on x86) and one was
      called just before restoring the processor state fully. It resulted
      in a double fault when both the tracing & the exception handler
      functions tried to use a per-CPU variable.
      
      Fixes: bb3632c6 (PM / sleep: trace events for suspend/resume)
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      Reported-by: default avatarJiri Kosina <jikos@kernel.org>
      Tested-by: default avatarRafael J. Wysocki <rafael@kernel.org>
      Tested-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      62822e2e
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ad83242a
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
        perf-cgroups fix and an AUX events fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Add enable_box for client MSR uncore
        perf/x86/intel/uncore: Fix uncore num_counters
        uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
        perf/core: Set cgroup in CPU contexts for new cgroup events
        perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
        perf probe ppc64le: Fix probe location when using DWARF
        perf probe: Add function to post process kernel trace events
        tools: Sync cpufeatures headers with the kernel
        toops: Sync tools/include/uapi/linux/bpf.h with the kernel
        tools: Sync cpufeatures.h and vmx.h with the kernel
        perf probe: Support signedness casting
        perf stat: Avoid skew when reading events
        perf probe: Fix module name matching
        perf probe: Adjust map->reloc offset when finding kernel symbol from map
        perf hists: Trim libtraceevent trace_seq buffers
        perf script: Add 'bpf-output' field to usage message
      ad83242a
    • Jeff Layton's avatar
      nfsd: don't return an unhashed lock stateid after taking mutex · dd257933
      Jeff Layton authored
      nfsd4_lock will take the st_mutex before working with the stateid it
      gets, but between the time when we drop the cl_lock and take the mutex,
      the stateid could become unhashed (a'la FREE_STATEID). If that happens
      the lock stateid returned to the client will be forgotten.
      
      Fix this by first moving the st_mutex acquisition into
      lookup_or_create_lock_state. Then, have it check to see if the lock
      stateid is still hashed after taking the mutex. If it's not, then put
      the stateid and try the find/create again.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Tested-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      dd257933
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f8083c6
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Misc fixes: lockstat fix, futex fix on !MMU systems, big endian fix
        for qrwlocks and a race fix for pvqspinlocks"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/pvqspinlock: Fix a bug in qstat_read()
        locking/pvqspinlock: Fix double hash race
        locking/qrwlock: Fix write unlock bug on big endian systems
        futex: Assume all mappings are private on !MMU systems
      1f8083c6
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 25db6918
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "A fix for an MSI regression"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/msi: Make sure PCI MSIs are activated early
      25db6918
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0e1117b2
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "A fix for EFI capsules and an SGI UV platform fix"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/capsule: Allocate whole capsule into virtual memory
        x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case
      0e1117b2
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 99091700
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
         - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
           needs multiple different security services (e.g.  krb5i and krb5p).
      
         - Stable patch to fix a regression introduced by the use of
           SO_REUSEPORT, and that prevented the use of multiple different NFS
           versions to the same server.
      
         - TCP socket reconnection timer fixes.
      
         - Patch from Neil to disable the use of IPv6 temporary addresses"
      
      * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4: Cap the transport reconnection timer at 1/2 lease period
        NFSv4: Cleanup the setting of the nfs4 lease period
        SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
        SUNRPC: Fix reconnection timeouts
        NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
        SUNRPC: disable the use of IPv6 temporary addresses.
        SUNRPC: allow for upcalls for same uid but different gss service
        SUNRPC: Fix up socket autodisconnect
        SUNRPC: Handle EADDRNOTAVAIL on connection failures
      99091700
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · c239ae10
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - Fix for the nd_blk (NVDIMM Block Window Aperture) driver.
      
         A spec clarification requires the driver to mask off reserved bits in
         status register.  This is tagged for -stable back to the v4.2 kernel.
      
       - Fix for a kernel crash in the nvdimm unit tests when module loading
         is interrupted with SIGTERM.  Tagged for -stable since validation
         efforts external to Intel use the unit tests for qualifying
         backports.
      
       - Add a new 'size' sysfs attribute for the BTT (NVDIMM Block
         Translation Table) driver to make it symmetric with the other
         namespace personality drivers (PFN and DAX) that provide a size
         attribute for indicating how much namespace capacity is lost to
         metadata.
      
         The BTT change arrived at the start of the merge window and has
         appeared in a -next release.  It can technically wait for 4.9, but it
         is small, fixes asymmetry in the libnvdimm-sysfs interface, and
         something I would have squeezed into the v4.8 pull request had it
         arrived a few days earlier.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        tools/testing/nvdimm: fix SIGTERM vs hotplug crash
        nvdimm, btt: add a size attribute for BTTs
        libnvdimm, nd_blk: mask off reserved status bits
      c239ae10
    • Linus Torvalds's avatar
      Merge tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 86fc0488
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A regression fix of HD-audio runtime PM and two USB quirks"
      
      * tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Manage power well properly for resume
        ALSA: usb-audio: Add quirk for ELP HD USB Camera
        ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)
      86fc0488
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 8766dc68
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Some powerpc fixes for 4.8:
      
        Misc:
         - powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
         - powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
         - powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
         - cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
         - powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
         - cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
         - powerpc/vdso: Add missing include file from Guenter Roeck
         - powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
         - powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
         - powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
         - crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
         - powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
      
        Benjamin Herrenschmidt:
         - powerpc/32: Fix crash during static key init
         - powerpc: Update obsolete comment in setup_32.c about early_init()
         - powerpc: Print the kernel load address at the end of prom_init()
         - powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
         - powerpc/xics: Properly set Edge/Level type and enable resend
      
        Mahesh Salgaonkar:
         - powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
         - powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
         - powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
         - powerpc/powernv: Load correct TOC pointer while waking up from winkle.
      
        Andrew Donnellan:
         - cxl: Fix sparse warnings
         - cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
      
        Michael Ellerman:
         - selftests/powerpc: Specify we expect to build with std=gnu99
         - powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
         - powerpc/pci: Fix endian bug in fixed PHB numbering"
      
      * tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
        selftests/powerpc: Specify we expect to build with std=gnu99
        powerpc/vdso: Fix build rules to rebuild vdsos correctly
        powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
        powerpc/32: Fix crash during static key init
        powerpc: Update obsolete comment in setup_32.c about early_init()
        powerpc: Print the kernel load address at the end of prom_init()
        powerpc/ptrace: Fix coredump since ptrace TM changes
        powerpc/32: Fix csum_partial_copy_generic()
        cxl: Set psl_fir_cntl to production environment value
        powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
        powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
        powerpc/pci: Fix endian bug in fixed PHB numbering
        powerpc/eeh: Switch to conventional PCI address output in EEH log
        cxl: Fix sparse warnings
        cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
        cxl: Use fixed width predefined types in data structure.
        powerpc/vdso: Add missing include file
        powerpc: Fix unused function warning 'lmb_to_memblock'
        powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
        powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
        ...
      8766dc68