1. 30 Mar, 2017 2 commits
    • Chris Wilson's avatar
      locking/ww-mutex: Limit stress test to 2 seconds · 57dd924e
      Chris Wilson authored
      Use a timeout rather than a fixed number of loops to avoid running for
      very long periods, such as under the kbuilder VMs.
      Reported-by: default avatarkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170310105733.6444-1-chris@chris-wilson.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      57dd924e
    • Peter Zijlstra's avatar
      locking/atomic: Fix atomic_try_cmpxchg() semantics · 44fe8445
      Peter Zijlstra authored
      Dmitry noted that the new atomic_try_cmpxchg() primitive is broken when
      the old pointer doesn't point to the local stack.
      
      He writes:
      
        "Consider a classical lock-free stack push:
      
          node->next = atomic_read(&head);
          do {
          } while (!atomic_try_cmpxchg(&head, &node->next, node));
      
        This code is broken with the current implementation, the problem is
        with unconditional update of *__po.
      
        In case of success it writes the same value back into *__po, but in
        case of cmpxchg success we might have lose ownership of some memory
        locations and potentially over what __po has pointed to. The same
        holds for the re-read of *__po. "
      
      He also points out that this makes it surprisingly different from the
      similar C/C++ atomic operation.
      
      After investigating the code-gen differences caused by this patch; and
      a number of alternatives (Linus dislikes this interface lots), we
      arrived at these results (size x86_64-defconfig/vmlinux):
      
        GCC-6.3.0:
      
        10735757        cmpxchg
        10726413        try_cmpxchg
        10730509        try_cmpxchg + patch
        10730445        try_cmpxchg-linus
      
        GCC-7 (20170327):
      
        10709514        cmpxchg
        10704266        try_cmpxchg
        10704266        try_cmpxchg + patch
        10704394        try_cmpxchg-linus
      
      From this we see that the patch has the advantage of better code-gen
      on GCC-7 and keeps the interface roughly consistent with the C
      language variant.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: a9ebf306 ("locking/atomic: Introduce atomic_try_cmpxchg()")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      44fe8445
  2. 26 Mar, 2017 1 commit
  3. 23 Mar, 2017 16 commits
    • Peter Zijlstra's avatar
      futex: Drop hb->lock before enqueueing on the rtmutex · 56222b21
      Peter Zijlstra authored
      When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
      chain code will (falsely) report a deadlock and BUG.
      
      The problem is that it hold hb->lock (now an rt_mutex) while doing
      task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
      interleaved just right with futex_unlock_pi() leads it to believe to see an
      AB-BA deadlock.
      
        Task1 (holds rt_mutex,	Task2 (does FUTEX_LOCK_PI)
               does FUTEX_UNLOCK_PI)
      
      				lock hb->lock
      				lock rt_mutex (as per start_proxy)
        lock hb->lock
      
      Which is a trivial AB-BA.
      
      It is not an actual deadlock, because it won't be holding hb->lock by the
      time it actually blocks on the rt_mutex, but the chainwalk code doesn't
      know that and it would be a nightmare to handle this gracefully.
      
      To avoid this problem, do the same as in futex_unlock_pi() and drop
      hb->lock after acquiring wait_lock. This still fully serializes against
      futex_unlock_pi(), since adding to the wait_list does the very same lock
      dance, and removing it holds both locks.
      
      Aside of solving the RT problem this makes the lock and unlock mechanism
      symetric and reduces the hb->lock held time.
      Reported-and-tested-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      56222b21
    • Peter Zijlstra's avatar
      futex: Futex_unlock_pi() determinism · bebe5b51
      Peter Zijlstra authored
      The problem with returning -EAGAIN when the waiter state mismatches is that
      it becomes very hard to proof a bounded execution time on the
      operation. And seeing that this is a RT operation, this is somewhat
      important.
      
      While in practise; given the previous patch; it will be very unlikely to
      ever really take more than one or two rounds, proving so becomes rather
      hard.
      
      However, now that modifying wait_list is done while holding both hb->lock
      and wait_lock, the scenario can be avoided entirely by acquiring wait_lock
      while still holding hb-lock. Doing a hand-over, without leaving a hole.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bebe5b51
    • Peter Zijlstra's avatar
      futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() · cfafcd11
      Peter Zijlstra authored
      By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list
      modifications are done under both hb->lock and wait_lock.
      
      This closes the obvious interleave pattern between futex_lock_pi() and
      futex_unlock_pi(), but not entirely so. See below:
      
      Before:
      
      futex_lock_pi()			futex_unlock_pi()
        unlock hb->lock
      
      				  lock hb->lock
      				  unlock hb->lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
      
        schedule()
      
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  <idem>
      				    -EAGAIN
      
        lock hb->lock
      
      
      After:
      
      futex_lock_pi()			futex_unlock_pi()
      
        lock hb->lock
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
        unlock hb->lock
      
        schedule()
      				  lock hb->lock
      				  unlock hb->lock
        lock hb->lock
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        unlock hb->lock
      
      
      It does however solve the earlier starvation/live-lock scenario which got
      introduced with the -EAGAIN since unlike the before scenario; where the
      -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the
      after scenario it happens while futex_unlock_pi() actually holds a lock,
      and then it is serialized on that lock.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      cfafcd11
    • Peter Zijlstra's avatar
      futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() · 38d589f2
      Peter Zijlstra authored
      With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
      consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
      parts, such that only the actual blocking can be done without hb->lock
      held.
      
      Split split_mutex_finish_proxy_lock() into two parts, one that does the
      blocking and one that does remove_waiter() when the lock acquire failed.
      
      When the rtmutex was acquired successfully the waiter can be removed in the
      acquisiton path safely, since there is no concurrency on the lock owner.
      
      This means that, except for futex_lock_pi(), all wait_list modifications
      are done with both hb->lock and wait_lock held.
      
      [bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      38d589f2
    • Peter Zijlstra's avatar
      futex,rt_mutex: Introduce rt_mutex_init_waiter() · 50809358
      Peter Zijlstra authored
      Since there's already two copies of this code, introduce a helper now
      before adding a third one.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      50809358
    • Peter Zijlstra's avatar
      futex: Pull rt_mutex_futex_unlock() out from under hb->lock · 16ffa12d
      Peter Zijlstra authored
      There's a number of 'interesting' problems, all caused by holding
      hb->lock while doing the rt_mutex_unlock() equivalient.
      
      Notably:
      
       - a PI inversion on hb->lock; and,
      
       - a SCHED_DEADLINE crash because of pointer instability.
      
      The previous changes:
      
       - changed the locking rules to cover {uval,pi_state} with wait_lock.
      
       - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
         turn allows to rely on wait_lock atomicity completely.
      
       - simplified the waiter conundrum.
      
      It's now sufficient to hold rtmutex::wait_lock and a reference on the
      pi_state to protect the state consistency, so hb->lock can be dropped
      before calling rt_mutex_futex_unlock().
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      16ffa12d
    • Peter Zijlstra's avatar
      futex: Rework inconsistent rt_mutex/futex_q state · 73d786bd
      Peter Zijlstra authored
      There is a weird state in the futex_unlock_pi() path when it interleaves
      with a concurrent futex_lock_pi() at the point where it drops hb->lock.
      
      In this case, it can happen that the rt_mutex wait_list and the futex_q
      disagree on pending waiters, in particular rt_mutex will find no pending
      waiters where futex_q thinks there are. In this case the rt_mutex unlock
      code cannot assign an owner.
      
      The futex side fixup code has to cleanup the inconsistencies with quite a
      bunch of interesting corner cases.
      
      Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
      situation occurs. This then gives the futex_lock_pi() code the opportunity
      to continue and the retried futex_unlock_pi() will now observe a coherent
      state.
      
      The only problem is that this breaks RT timeliness guarantees. That
      is, consider the following scenario:
      
        T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
      
          CPU0
      
          T1
            lock_pi()
            queue_me()  <- Waiter is visible
      
          preemption
      
          T2
            unlock_pi()
      	loops with -EAGAIN forever
      
      Which is undesirable for PI primitives. Future patches will rectify
      this.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      73d786bd
    • Peter Zijlstra's avatar
      futex: Cleanup refcounting · bf92cf3a
      Peter Zijlstra authored
      Add a put_pit_state() as counterpart for get_pi_state() so the refcounting
      becomes consistent.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bf92cf3a
    • Peter Zijlstra's avatar
      futex: Change locking rules · 734009e9
      Peter Zijlstra authored
      Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
      creates another set of problems, especially priority inversions on RT where
      hb->lock becomes a rt_mutex itself.
      
      The rt_mutex::wait_lock is the most obvious protection for keeping the
      futex user space value and the kernel internal pi_state in sync.
      
      Rework and document the locking so rt_mutex::wait_lock is held accross all
      operations which modify the user space value and the pi state.
      
      This allows to invoke rt_mutex_unlock() (including deboost) without holding
      hb->lock as a next step.
      
      Nothing yet relies on the new locking rules.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      734009e9
    • Peter Zijlstra's avatar
      futex,rt_mutex: Provide futex specific rt_mutex API · 5293c2ef
      Peter Zijlstra authored
      Part of what makes futex_unlock_pi() intricate is that
      rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
      rt_mutex::wait_lock.
      
      This means it cannot rely on the atomicy of wait_lock, which would be
      preferred in order to not rely on hb->lock so much.
      
      The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can
      race with the rt_mutex fastpath, however futexes have their own fast path.
      
      Since futexes already have a bunch of separate rt_mutex accessors, complete
      that set and implement a rt_mutex variant without fastpath for them.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5293c2ef
    • Peter Zijlstra's avatar
      futex: Remove rt_mutex_deadlock_account_*() · fffa954f
      Peter Zijlstra authored
      These are unused and clutter up the code.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.652692478@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fffa954f
    • Peter Zijlstra's avatar
      futex: Use smp_store_release() in mark_wake_futex() · 1b367ece
      Peter Zijlstra authored
      Since the futex_q can dissapear the instruction after assigning NULL,
      this really should be a RELEASE barrier. That stops loads from hitting
      dead memory too.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1b367ece
    • Peter Zijlstra's avatar
      futex: Cleanup variable names for futex_top_waiter() · 499f5aca
      Peter Zijlstra authored
      futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
      this to a variable 'match' totally obscures the code.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      499f5aca
    • Peter Zijlstra's avatar
      locking/atomic/x86: Use atomic_try_cmpxchg() · e6790e4b
      Peter Zijlstra authored
      Better code generation:
      
            text           data  bss        name
        10665111        4530096  843776     defconfig-build/vmlinux.3
        10655703        4530096  843776     defconfig-build/vmlinux.4
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e6790e4b
    • Peter Zijlstra's avatar
      locking/refcounts: Use atomic_try_cmpxchg() · b78c0d47
      Peter Zijlstra authored
      Generates better code (GCC-6.2.1):
      
        text        filename
        1576        defconfig-build/lib/refcount.o.pre
        1488        defconfig-build/lib/refcount.o.post
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b78c0d47
    • Peter Zijlstra's avatar
      locking/atomic: Introduce atomic_try_cmpxchg() · a9ebf306
      Peter Zijlstra authored
      Add a new cmpxchg interface:
      
        bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);
      
      Where the boolean returns the result of the compare; and thus if the
      exchange happened; and in case of failure, the new value of *ptr is
      returned in *val.
      
      This allows simplification/improvement of loops like:
      
      	for (;;) {
      		new = val $op $imm;
      		old = cmpxchg(ptr, val, new);
      		if (old == val)
      			break;
      		val = old;
      	}
      
      into:
      
      	do {
      	} while (!try_cmpxchg(ptr, &val, val $op $imm));
      
      while also generating better code (GCC6 and onwards).
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a9ebf306
  4. 16 Mar, 2017 7 commits
  5. 14 Mar, 2017 2 commits
  6. 13 Mar, 2017 1 commit
  7. 12 Mar, 2017 5 commits
    • Linus Torvalds's avatar
      Linux 4.11-rc2 · 4495c08e
      Linus Torvalds authored
      4495c08e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 56b24d1b
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
      
       - four patches to get the new cputime code in shape for s390
      
       - add the new statx system call
      
       - a few bug fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: wire up statx system call
        KVM: s390: Fix guest migration for huge guests resulting in panic
        s390/ipl: always use load normal for CCW-type re-IPL
        s390/timex: micro optimization for tod_to_ns
        s390/cputime: provide archicture specific cputime_to_nsecs
        s390/cputime: reset all accounting fields on fork
        s390/cputime: remove last traces of cputime_t
        s390: fix in-kernel program checks
        s390/crypt: fix missing unlock in ctr_paes_crypt on error path
      56b24d1b
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5a45a5a8
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - a fix for the kexec/purgatory regression which was introduced in the
         merge window via an innocent sparse fix. We could have reverted that
         commit, but on deeper inspection it turned out that the whole
         machinery is neither documented nor robust. So a proper cleanup was
         done instead
      
       - the fix for the TLB flush issue which was discovered recently
      
       - a simple typo fix for a reboot quirk
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tlb: Fix tlb flushing when lguest clears PGE
        kexec, x86/purgatory: Unbreak it and clean it up
        x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
      5a45a5a8
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ecade114
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
      
       - a workaround for a GIC erratum
      
       - a missing stub function for CONFIG_IRQDOMAIN=n
      
       - fixes for a couple of type inconsistencies
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/crossbar: Fix incorrect type of register size
        irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
        irqdomain: Add empty irq_domain_check_msi_remap
        irqchip/crossbar: Fix incorrect type of local variables
      ecade114
    • Daniel Borkmann's avatar
      x86/tlb: Fix tlb flushing when lguest clears PGE · 2c4ea6e2
      Daniel Borkmann authored
      Fengguang reported random corruptions from various locations on x86-32
      after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and
      9d876e79 ("bpf: fix unlocking of jited image when module ronx not set")
      that uses the former. While x86-32 doesn't have a JIT like x86_64, the
      bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
      ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
      support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
      enabled.
      
      After investigating the crashes further, it turned out that using
      set_memory_ro() and set_memory_rw() didn't have the desired effect, for
      example, setting the pages as read-only on x86-32 would still let
      probe_kernel_write() succeed without error. This behavior would manifest
      itself in situations where the vmalloc'ed buffer was accessed prior to
      set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
      wasn't, the page attribute changes seemed to have taken effect, leading to
      the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
      that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
      "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
      a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
      though.
      
      There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
      on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
      (depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
      my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
      kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
      worked fine, and further investigating the cr4 one turned out that
      X86_CR4_PGE bit was not set in cr4 register, meaning the
      __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
      value instead of clearing X86_CR4_PGE in the first write to trigger the
      flush.
      
      It turned out that X86_CR4_PGE was cleared from cr4 during init from
      lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
      cleared from there due to concerns of using PGE in guest kernel that can
      lead to hard to trace bugs (see bff672e6 ("lguest: documentation V:
      Host") in init()). The CPU feature bits are cleared in dynamic
      boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
      static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
      flushing to use, meaning they still used the old setting of the host
      kernel.
      
      Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
      to static_cpu_has() checks is too late at this point as sections have been
      patched already, so for now, it seems reasonable to switch back to
      boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95
      ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
      cr3 as originally intended, properly makes the new page attributes visible
      and thus fixes the crashes seen by Fengguang.
      
      Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: bp@suse.de
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: lkp@01.org
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
      Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2c4ea6e2
  8. 11 Mar, 2017 6 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 106e4da6
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM updates from Marc Zyngier:
         - vgic updates:
           - Honour disabling the ITS
           - Don't deadlock when deactivating own interrupts via MMIO
           - Correctly expose the lact of IRQ/FIQ bypass on GICv3
      
         - I/O virtualization:
           - Make KVM_CAP_NR_MEMSLOTS big enough for large guests with many
             PCIe devices
      
         - General bug fixes:
           - Gracefully handle exception generated with syndroms that the host
             doesn't understand
           - Properly invalidate TLBs on VHE systems
      
        x86:
         - improvements in emulation of VMCLEAR, VMX MSR bitmaps, and VCPU
           reset
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: nVMX: do not warn when MSR bitmap address is not backed
        KVM: arm64: Increase number of user memslots to 512
        KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused
        KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64
        KVM: Add documentation for KVM_CAP_NR_MEMSLOTS
        KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
        arm64: KVM: Survive unknown traps from guests
        arm: KVM: Survive unknown traps from guests
        KVM: arm/arm64: Let vcpu thread modify its own active state
        KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
        kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
        KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass
        arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
      106e4da6
    • Linus Torvalds's avatar
      Merge tag 'extable-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux · 4b050f22
      Linus Torvalds authored
      Pull extable.h fix from Paul Gortmaker:
       "Fixup for arch/score after extable.h introduction.
      
        It seems that Guenter is the only one on the planet doing builds for
        arch/score -- we don't have compile coverage for it in linux-next or
        in the kbuild-bot either. Guenter couldn't even recall where he got
        his toolchain, but was kind enough to share it with me so I could
        validate this change and also add arch/score to my build coverage.
      
        I sat on this a bit in case there was any other fallout in other arch
        dirs, but since this still seems to be the only one, I might as well
        send it on its way"
      
      * tag 'extable-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
        score: Fix implicit includes now failing build after extable change
      4b050f22
    • Linus Torvalds's avatar
      Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 84c37c16
      Linus Torvalds authored
      Pull random updates from Ted Ts'o:
       "Change get_random_{int,log} to use the CRNG used by /dev/urandom and
        getrandom(2). It's faster and arguably more secure than cut-down MD5
        that we had been using.
      
        Also do some code cleanup"
      
      * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        random: move random_min_urandom_seed into CONFIG_SYSCTL ifdef block
        random: convert get_random_int/long into get_random_u32/u64
        random: use chacha20 for get_random_int/long
        random: fix comment for unused random_min_urandom_seed
        random: remove variable limit
        random: remove stale urandom_init_wait
        random: remove stale maybe_reseed_primary_crng
      84c37c16
    • Guenter Roeck's avatar
      score: Fix implicit includes now failing build after extable change · 0acf6119
      Guenter Roeck authored
      After changing from module.h to extable.h, score builds fail with:
      
        arch/score/kernel/traps.c: In function 'do_ri':
        arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step'
        arch/score/mm/extable.c: In function 'fixup_exception':
        arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type
        arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type
      
      because extable.h doesn't drag in the same amount of headers as the
      module.h did.  Add in the headers which were implicitly expected.
      
      Fixes: 90858794 ("module.h: remove extable.h include now users have migrated")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      [PG: tweak commit log; refresh for sched header refactoring.]
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      0acf6119
    • Linus Torvalds's avatar
      Merge tag 'tty-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 434fd635
      Linus Torvalds authored
      Pull tty/serial fixes frpm Greg KH:
       "Here are two bugfixes for tty stuff for 4.11-rc2.
      
        One of them resolves the pretty bad bug in the n_hdlc code that
        Alexander Popov found and fixed and has been reported everywhere. The
        other just fixes a samsung serial driver issue when DMA fails on some
        systems.
      
        Both have been in linux-next with no reported issues"
      
      * tag 'tty-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: samsung: Continue to work if DMA request fails
        tty: n_hdlc: get rid of racy n_hdlc.tbuf
      434fd635
    • Linus Torvalds's avatar
      Merge tag 'staging-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 85298808
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are two small build warning fixes for some staging drivers that
        Arnd has found on his valiant quest to get the kernel to build
        properly with no warnings.
      
        Both of these have been in linux-next this week and resolve the
        reported issues"
      
      * tag 'staging-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: octeon: remove unused variable
        staging/vc04_services: add CONFIG_OF dependency
      85298808