• Thomas Gleixner's avatar
    x86/speculation: Prevent stale SPEC_CTRL msr content · 6d991ba5
    Thomas Gleixner authored
    The seccomp speculation control operates on all tasks of a process, but
    only the current task of a process can update the MSR immediately. For the
    other threads the update is deferred to the next context switch.
    
    This creates the following situation with Process A and B:
    
    Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2
    does not have the speculation control TIF bit set. Process B task 1 has the
    speculation control TIF bit set.
    
    CPU0					CPU1
    					MSR bit is set
    					ProcB.T1 schedules out
    					ProcA.T2 schedules in
    					MSR bit is cleared
    ProcA.T1
      seccomp_update()
      set TIF bit on ProcA.T2
    					ProcB.T1 schedules in
    					MSR is not updated  <-- FAIL
    
    This happens because the context switch code tries to avoid the MSR update
    if the speculation control TIF bits of the incoming and the outgoing task
    are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks
    scheduling back and forth on CPU1, which keeps the MSR stale forever.
    
    In theory this could be remedied by IPIs, but chasing the remote task which
    could be migrated is complex and full of races.
    
    The straight forward solution is to avoid the asychronous update of the TIF
    bit and defer it to the next context switch. The speculation control state
    is stored in task_struct::atomic_flags by the prctl and seccomp updates
    already.
    
    Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the
    atomic_flags. Check the bit on context switch and force a synchronous
    update of the speculation control if set. Use the same mechanism for
    updating the current task.
    Reported-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Jiri Kosina <jkosina@suse.cz>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Casey Schaufler <casey.schaufler@intel.com>
    Cc: Asit Mallick <asit.k.mallick@intel.com>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Jon Masters <jcm@redhat.com>
    Cc: Waiman Long <longman9394@gmail.com>
    Cc: Greg KH <gregkh@linuxfoundation.org>
    Cc: Dave Stewart <david.c.stewart@intel.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.de
    6d991ba5
spec-ctrl.h 2.81 KB