1. 30 Jan, 2017 2 commits
  2. 22 Jan, 2017 1 commit
  3. 20 Jan, 2017 1 commit
    • Peter Zijlstra's avatar
      sched/clock: Fix hotplug crash · acb04058
      Peter Zijlstra authored
      Mike reported that he could trigger the WARN_ON_ONCE() in
      set_sched_clock_stable() using hotplug.
      
      This exposed a fundamental problem with the interface, we should never
      mark the TSC stable if we ever find it to be unstable. Therefore
      set_sched_clock_stable() is a broken interface.
      
      The reason it existed is that not having it is a pain, it means all
      relevant architecture code needs to call clear_sched_clock_stable()
      where appropriate.
      
      Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
      and parisc are trivial in that they never called
      set_sched_clock_stable(), so add an unconditional call to
      clear_sched_clock_stable() to them.
      
      For x86 the story is a lot more involved, and what this patch tries to
      do is ensure we preserve the status quo. So even is Cyrix or Transmeta
      have usable TSC they never called set_sched_clock_stable() so they now
      get an explicit mark unstable.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9881b024 ("sched/clock: Delay switching sched_clock to stable")
      Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      acb04058
  4. 19 Jan, 2017 1 commit
  5. 14 Jan, 2017 35 commits
    • Ingo Molnar's avatar
      locking/mutex, sched/wait: Fix the mutex_lock_io_nested() define · f21860ba
      Ingo Molnar authored
      Mike noticed this bogosity:
      
       > > +# define mutex_lock_nest_io(lock, nest_lock) mutex_io(lock)
       >                                                 ^^^^^^^^^^^^^^ typo
      
      This new locking API is not used yet, so this didn't trigger in testing.
      
      Fix it.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f21860ba
    • Tejun Heo's avatar
      fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex · 6fa7aa50
      Tejun Heo authored
      When an ext4 fs is bogged down by a lot of metadata IOs (in the
      reported case, it was deletion of millions of files, but any massive
      amount of journal writes would do), after the journal is filled up,
      tasks which try to access the filesystem and aren't currently
      performing the journal writes end up waiting in
      __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex.
      
      Because those mutex sleeps aren't marked as iowait, this condition can
      lead to misleadingly low iowait and /proc/stat:procs_blocked.  While
      iowait propagation is far from strict, this condition can be triggered
      fairly easily and annotating these sleeps correctly helps initial
      diagnosis quite a bit.
      
      Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that
      these sleeps are properly marked as iowait.
      Reported-by: default avatarMingbo Wan <mingbo@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6fa7aa50
    • Tejun Heo's avatar
      locking/mutex, sched/wait: Add mutex_lock_io() · 1460cb65
      Tejun Heo authored
      We sometimes end up propagating IO blocking through mutexes; however,
      because there currently is no way of annotating mutex sleeps as
      iowait, there are cases where iowait and /proc/stat:procs_blocked
      report misleading numbers obscuring the actual state of the system.
      
      This patch adds mutex_lock_io() so that mutex sleeps can be marked as
      iowait in those cases.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/1477673892-28940-4-git-send-email-tj@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1460cb65
    • Tejun Heo's avatar
      sched/core: Separate out io_schedule_prepare() and io_schedule_finish() · 10ab5643
      Tejun Heo authored
      Now that IO schedule accounting is done inside __schedule(),
      io_schedule() can be split into three steps - prep, schedule, and
      finish - where the schedule part doesn't need any special annotation.
      This allows marking a sleep as iowait by simply wrapping an existing
      blocking function with io_schedule_prepare() and io_schedule_finish().
      
      Because task_struct->in_iowait is single bit, the caller of
      io_schedule_prepare() needs to record and the pass its state to
      io_schedule_finish() to be safe regarding nesting.  While this isn't
      the prettiest, these functions are mostly gonna be used by core
      functions and we don't want to use more space for ->in_iowait.
      
      While at it, as it's simple to do now, reimplement io_schedule()
      without unnecessarily going through io_schedule_timeout().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/1477673892-28940-3-git-send-email-tj@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10ab5643
    • Tejun Heo's avatar
      sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler · e33a9bba
      Tejun Heo authored
      For an interface to support blocking for IOs, it must call
      io_schedule() instead of schedule().  This makes it tedious to add IO
      blocking to existing interfaces as the switching between schedule()
      and io_schedule() is often buried deep.
      
      As we already have a way to mark the task as IO scheduling, this can
      be made easier by separating out io_schedule() into multiple steps so
      that IO schedule preparation can be performed before invoking a
      blocking interface and the actual accounting happens inside the
      scheduler.
      
      io_schedule_timeout() does the following three things prior to calling
      schedule_timeout().
      
       1. Mark the task as scheduling for IO.
       2. Flush out plugged IOs.
       3. Account the IO scheduling.
      
      done close to the actual scheduling.  This patch moves #3 into the
      scheduler so that later patches can separate out preparation and
      finish steps from io_schedule().
      Patch-originally-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: akpm@linux-foundation.org
      Cc: axboe@kernel.dk
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/20161207204841.GA22296@htj.duckdns.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e33a9bba
    • Dietmar Eggemann's avatar
      sched/fair: Explain why MIN_SHARES isn't scaled in calc_cfs_shares() · b8fd8423
      Dietmar Eggemann authored
      Signed-off-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Turner <pjt@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e9a4d858-bcf3-36b9-e3a9-449953e34569@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b8fd8423
    • Vincent Guittot's avatar
      sched/core: Fix group_entity's share update · 89ee048f
      Vincent Guittot authored
      The update of the share of a cfs_rq is done when its load_avg is updated
      but before the group_entity's load_avg has been updated for the past time
      slot. This generates wrong load_avg accounting which can be significant
      when small tasks are involved in the scheduling.
      
      Let take the example of a task a that is dequeued of its task group A:
         root
        (cfs_rq)
          \
          (se)
           A
          (cfs_rq)
            \
            (se)
             a
      
      Task "a" was the only task in task group A which becomes idle when a is
      dequeued.
      
      We have the sequence:
      
      - dequeue_entity a->se
          - update_load_avg(a->se)
          - dequeue_entity_load_avg(A->cfs_rq, a->se)
          - update_cfs_shares(A->cfs_rq)
      	A->cfs_rq->load.weight == 0
              A->se->load.weight is updated with the new share (0 in this case)
      - dequeue_entity A->se
          - update_load_avg(A->se) but its weight is now null so the last time
            slot (up to a tick) will be accounted with a weight of 0 instead of
            its real weight during the time slot. The last time slot will be
            accounted as an idle one whereas it was a running one.
      
      If the running time of task a is short enough that no tick happens when it
      runs, all running time of group entity A->se will be accounted as idle
      time.
      
      Instead, we should update the share of a cfs_rq (in fact the weight of its
      group entity) only after having updated the load_avg of the group_entity.
      
      update_cfs_shares() now takes the sched_entity as a parameter instead of the
      cfs_rq, and the weight of the group_entity is updated only once its load_avg
      has been synced with current time.
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/1482335426-7664-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      89ee048f
    • Peter Zijlstra's avatar
      sched/completions: Fix complete_all() semantics · da9647e0
      Peter Zijlstra authored
      Documentation/scheduler/completion.txt says this about complete_all():
      
        "calls complete_all() to signal all current and future waiters."
      
      Which doesn't strictly match the current semantics. Currently
      complete_all() is equivalent to UINT_MAX/2 complete() invocations,
      which is distinctly less than 'all current and future waiters'
      (enumerable vs innumerable), although it has worked in practice.
      
      However, Dmitry had a weird case where it might matter, so change
      completions to use saturation semantics for complete()/complete_all().
      Once done hits UINT_MAX (and complete_all() sets it there) it will
      never again be decremented.
      Requested-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: der.herr@hofr.at
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      da9647e0
    • Tommaso Cucinotta's avatar
      sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched · 59f8c298
      Tommaso Cucinotta authored
      This patch allows for reading the current (leftover) runtime and
      absolute deadline of a SCHED_DEADLINE task through /proc/*/sched
      (entries dl.runtime and dl.deadline), while debugging/testing.
      Signed-off-by: default avatarTommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Reviewed-by: default avatarLuca Abeni <luca.abeni@unitn.it>
      Acked-by: default avatarDaniel Bistrot de Oliveira <danielbristot@gmail.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1477473437-10346-2-git-send-email-tommaso.cucinotta@sssup.itSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      59f8c298
    • Peter Zijlstra's avatar
      sched/clock: Provide better clock continuity · 5680d809
      Peter Zijlstra authored
      When switching between the unstable and stable variants it is
      currently possible that clock discontinuities occur.
      
      And while these will mostly be 'small', attempt to do better.
      
      As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
      ktime_get_ns() based timeline at the point of switchover
      (sched_clock_init_late()) after SMP bringup.
      
      Equally, when the TSC is later found to be unstable -- typically
      because SMM tries to hide its SMI latencies by mucking with the TSC --
      we want to avoid large jumps.
      
      Since the clocksource watchdog reports the issue after the fact we
      cannot exactly fix up time, but since SMI latencies are typically
      small (~10ns range), the discontinuity is mainly due to drift between
      sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
      24days).
      
      I dislike this patch because it adds overhead to the good case in
      favour of dealing with badness. But given the widespread failure of
      TSC stability this is worth it.
      
      Note that in case the TSC makes drastic jumps after SMP bringup we're
      still hosed. There's just not much we can do in that case without
      stupid overhead.
      
      If we were to somehow expose tsc_clocksource_reliable (which is hard
      because this code is also used on ia64 and parisc) we could avoid some
      of the newly introduced overhead.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5680d809
    • Peter Zijlstra's avatar
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra authored
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9881b024
    • Peter Zijlstra's avatar
      sched/clock: Update static_key usage · 555570d7
      Peter Zijlstra authored
      sched_clock was still using the deprecated static_key interface.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      555570d7
    • Thomas Gleixner's avatar
      sched/clock, clocksource: Add optional cs::mark_unstable() method · 12907fbb
      Thomas Gleixner authored
      PeterZ reported that we'd fail to mark the TSC unstable when the
      clocksource watchdog finds it unsuitable.
      
      Allow a clocksource to run a custom action when its being marked
      unstable and hook up the TSC unstable code.
      Reported-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12907fbb
    • Matt Fleming's avatar
      sched/core: Add debugging code to catch missing update_rq_clock() calls · cb42c9a3
      Matt Fleming authored
      There's no diagnostic checks for figuring out when we've accidentally
      missed update_rq_clock() calls. Let's add some by piggybacking on the
      rq_*pin_lock() wrappers.
      
      The idea behind the diagnostic checks is that upon pining rq lock the
      rq clock should be updated, via update_rq_clock(), before anybody
      reads the clock with rq_clock() or rq_clock_task().
      
      The exception to this rule is when updates have explicitly been
      disabled with the rq_clock_skip_update() optimisation.
      
      There are some functions that only unpin the rq lock in order to grab
      some other lock and avoid deadlock. In that case we don't need to
      update the clock again and the previous diagnostic state can be
      carried over in rq_repin_lock() by saving the state in the rq_flags
      context.
      
      Since this patch adds a new clock update flag and some already exist
      in rq::clock_skip_update, that field has now been renamed. An attempt
      has been made to keep the flag manipulation code small and fast since
      it's used in the heart of the __schedule() fast path.
      
      For the !CONFIG_SCHED_DEBUG case the only object code change (other
      than addresses) is the following change to reset RQCF_ACT_SKIP inside
      of __schedule(),
      
        -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
        -       00 00 00
        +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cb42c9a3
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() call in set_user_nice() · 2fb8d367
      Peter Zijlstra authored
      Address this rq-clock update bug:
      
        WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          set_next_entity()
          ? _raw_spin_lock()
          set_curr_task_fair()
          set_user_nice.part.85()
          set_user_nice()
          create_worker()
          worker_thread()
          kthread()
          ret_from_fork()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2fb8d367
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() call for task_hot() · 3bed5e21
      Peter Zijlstra authored
      Add the update_rq_clock() call at the top of the callstack instead of
      at the bottom where we find it missing, this to aid later effort to
      minimize the number of update_rq_lock() calls.
      
        WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          assert_clock_updated.isra.63.part.64()
          can_migrate_task()
          load_balance()
          pick_next_task_fair()
          __schedule()
          schedule()
          worker_thread()
          kthread()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3bed5e21
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() · 80f5c1b8
      Peter Zijlstra authored
      Instead of adding the update_rq_clock() all the way at the bottom of
      the callstack, add one at the top, this to aid later effort to
      minimize update_rq_lock() calls.
      
        WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          detach_task_cfs_rq()
          switched_from_fair()
          __sched_setscheduler()
          _sched_setscheduler()
          sched_set_stop_task()
          cpu_stop_create()
          __smpboot_create_thread.part.2()
          smpboot_register_percpu_thread_cpumask()
          cpu_stop_init()
          do_one_initcall()
          ? print_cpu_info()
          kernel_init_freeable()
          ? rest_init()
          kernel_init()
          ret_from_fork()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      80f5c1b8
    • Peter Zijlstra's avatar
      sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() · 4126bad6
      Peter Zijlstra authored
      Address this rq-clock update bug:
      
        WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          __warn()
          post_init_entity_util_avg()
          wake_up_new_task()
          _do_fork()
          kernel_thread()
          rest_init()
          start_kernel()
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4126bad6
    • Matt Fleming's avatar
      sched/fair: Push rq lock pin/unpin into idle_balance() · 46f69fa3
      Matt Fleming authored
      Future patches will emit warnings if rq_clock() is called before
      update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.
      
      Since there is only one caller of idle_balance() we can push the
      unpin/repin there.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-7-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      46f69fa3
    • Matt Fleming's avatar
      sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock · 92509b73
      Matt Fleming authored
      rq_clock() is called from sched_info_{depart,arrive}() after resetting
      RQCF_ACT_SKIP but prior to a call to update_rq_clock().
      
      In preparation for pending patches that check whether the rq clock has
      been updated inside of a pin context before rq_clock() is called, move
      the reset of rq->clock_skip_update immediately before unpinning the rq
      lock.
      
      This will avoid the new warnings which check if update_rq_clock() is
      being actively skipped.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      92509b73
    • Matt Fleming's avatar
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming authored
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d8ac8971
    • Frederic Weisbecker's avatar
      sched/cputime: Rename vtime_account_user() to vtime_flush() · c8d7dabf
      Frederic Weisbecker authored
      CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and
      account it on ticks and context switches only through the
      vtime_account_user() function.
      
      Now this model has been generalized on the 3 archs for all kind of
      cputime (system, irq, ...) and all the cputime flushing happens under
      vtime_account_user().
      
      So let's rename this function to better reflect its new role.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c8d7dabf
    • Martin Schwidefsky's avatar
      sched/cputime, s390: Implement delayed accounting of system time · b7394a5f
      Martin Schwidefsky authored
      The account_system_time() function is called with a cputime that
      occurred while running in the kernel. The function detects which
      context the CPU is currently running in and accounts the time to
      the correct bucket. This forces the arch code to account the
      cputime for hardirq and softirq immediately.
      
      Such accounting function can be costly and perform unwelcome divisions
      and multiplications, among others.
      
      The arch code can delay the accounting for system time. For s390
      the accounting is done once per timer tick and for each task switch.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      [ Rebase against latest linus tree and move account_system_index_scaled(). ]
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-10-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b7394a5f
    • Frederic Weisbecker's avatar
      sched/cputime, ia64: Accumulate cputime and account only on tick/task switch · 7dd58230
      Frederic Weisbecker authored
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7dd58230
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch · a19ff1a2
      Frederic Weisbecker authored
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a19ff1a2
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc: Migrate stolen_time field to the accounting structure · f828c3d0
      Frederic Weisbecker authored
      That in order to gather all cputime accumulation to the same place.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f828c3d0
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick · 8c8b73c4
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, provide finegrained accumulators to
      powerpc in order to store the cputime until flushing.
      
      While at it, normalize the name of several fields according to common
      cputime naming.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c8b73c4
    • Frederic Weisbecker's avatar
      sched/cputime: Export account_guest_time() · 1213699a
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's allow archs to account cputime
      directly to gtime.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1213699a
    • Frederic Weisbecker's avatar
      sched/cputime: Allow accounting system time using cpustat index · c31cc6a5
      Frederic Weisbecker authored
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's provide APIs to account system
      time to precise contexts: hardirq, softirq, pure system, ...
      Inspired-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c31cc6a5
    • Frederic Weisbecker's avatar
      sched/cputime, ia64: Fix incorrect start cputime assignment on task switch · 8388d214
      Frederic Weisbecker authored
      On task switch we must initialize the current cputime of the next task
      using the value of the previous task which got freshly updated.
      
      But we are confusing that with doing the opposite, which should result
      in incorrect cputime accounting.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-3-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8388d214
    • Frederic Weisbecker's avatar
      sched/cputime, powerpc32: Fix stale scaled stime on context switch · 90d08ba2
      Frederic Weisbecker authored
      On context switch with powerpc32, the cputime is accumulated in the
      thread_info struct. So the switching-in task must move forward its
      start time snapshot to the current time in order to later compute the
      delta spent in system mode.
      
      This is what we do for the normal cputime by initializing the starttime
      field to the value of the previous task's starttime which got freshly
      updated.
      
      But we are missing the update of the scaled cputime start time. As a
      result we may be accounting too much scaled cputime later.
      
      Fix this by initializing the scaled cputime the same way we do for
      normal cputime.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      90d08ba2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9