1. 07 Jul, 2015 9 commits
    • Andy Lutomirski's avatar
      x86/entry: Add new, comprehensible entry and exit handlers written in C · c5c46f59
      Andy Lutomirski authored
      The current x86 entry and exit code, written in a mixture of assembly and
      C code, is incomprehensible due to being open-coded in a lot of places
      without coherent documentation.
      
      It appears to work primary by luck and duct tape: i.e. obvious runtime
      failures were fixed on-demand, without re-thinking the design.
      
      Due to those reasons our confidence level in that code is low, and it is
      very difficult to incrementally improve.
      
      Add new code written in C, in preparation for simply deleting the old
      entry code.
      
      prepare_exit_to_usermode() is a new function that will handle all
      slow path exits to user mode.  It is called with IRQs disabled
      and it leaves us in a state in which it is safe to immediately
      return to user mode.  IRQs must not be re-enabled at any point
      after prepare_exit_to_usermode() returns and user mode is actually
      entered. (We can, of course, fail to enter user mode and treat
      that failure as a fresh entry to kernel mode.)
      
      All callers of do_notify_resume() will be migrated to call
      prepare_exit_to_usermode() instead; prepare_exit_to_usermode() needs
      to do everything that do_notify_resume() does today, but it also
      takes care of scheduling and context tracking.  Unlike
      do_notify_resume(), it does not need to be called in a loop.
      
      syscall_return_slowpath() is exactly what it sounds like: it will
      be called on any syscall exit slow path. It will replace
      syscall_trace_leave() and it calls prepare_exit_to_usermode() on the
      way out.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/c57c8b87661a4152801d7d3786eac2d1a2f209dd.1435952415.git.luto@kernel.org
      [ Improved the changelog a bit. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c5c46f59
    • Andy Lutomirski's avatar
      x86/entry: Add enter_from_user_mode() and use it in syscalls · feed36cd
      Andy Lutomirski authored
      Changing the x86 context tracking hooks is dangerous because
      there are no good checks that we track our context correctly.
      Add a helper to check that we're actually in CONTEXT_USER when
      we enter from user mode and wire it up for syscall entries.
      
      Subsequent patches will wire this up for all non-NMI entries as
      well.  NMIs are their own special beast and cannot currently
      switch overall context tracking state.  Instead, they have their
      own special RCU hooks.
      
      This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a
      branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a
      layer of indirection).  Eventually, we should fix up the core
      context tracking code to supply a function that does what we
      want (and can be much simpler than user_exit), which will enable
      us to get rid of the extra call.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/853b42420066ec3fb856779cdc223a6dcb5d355b.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      feed36cd
    • Andy Lutomirski's avatar
      x86/traps, context_tracking: Assert that we're in CONTEXT_KERNEL in exception entries · 02fdcd5e
      Andy Lutomirski authored
      Other than the super-atomic exception entries, all exception
      entries are supposed to switch our context tracking state to
      CONTEXT_KERNEL. Assert that they do.  These assertions appear
      trivial at this point, as exception_enter() is the function
      responsible for switching context, but I'm planning on reworking
      x86's exception context tracking, and these assertions will help
      make sure that all of this code keeps working.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/20fa1ee2d943233a184aaf96ff75394d3b34dfba.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      02fdcd5e
    • Andy Lutomirski's avatar
      x86/entry: Move C entry and exit code to arch/x86/entry/common.c · 1f484aa6
      Andy Lutomirski authored
      The entry and exit C helpers were confusingly scattered between
      ptrace.c and signal.c, even though they aren't specific to
      ptrace or signal handling.  Move them together in a new file.
      
      This change just moves code around.  It doesn't change anything.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1f484aa6
    • Andy Lutomirski's avatar
      notifiers, RCU: Assert that RCU is watching in notify_die() · e727c7d7
      Andy Lutomirski authored
      Low-level arch entries often call notify_die(), and it's easy for
      arch code to fail to exit an RCU quiescent state first.  Assert
      that we're not quiescent in notify_die().
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1f5fe6c23d5b432a23267102f2d72b787d80fdd8.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e727c7d7
    • Andy Lutomirski's avatar
      context_tracking: Add ct_state() and CT_WARN_ON() · f9281648
      Andy Lutomirski authored
      This will let us sprinkle sanity checks around the kernel
      without making too much of a mess.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/5da41fb2ceb29eac671f427c67040401ba2a1fa0.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f9281648
    • Ingo Molnar's avatar
      um: Fix do_signal() prototype · ccaee5f8
      Ingo Molnar authored
      Once x86 exports its do_signal(), the prototypes will clash.
      
      Fix the clash and also improve the code a bit: remove the
      unnecessary kern_do_signal() indirection. This allows
      interrupt_end() to share the 'regs' parameter calculation.
      
      Also remove the unused return code to match x86.
      
      Minimally build and boot tested.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/67c57eac09a589bac3c6c5ff22f9623ec55a184a.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ccaee5f8
    • Andy Lutomirski's avatar
      x86/entry/64/compat: Fix bad fast syscall arg failure path · 5e99cb7c
      Andy Lutomirski authored
      If user code does SYSCALL32 or SYSENTER without a valid stack,
      then our attempt to determine the syscall args will result in a
      failed uaccess fault.  Previously, we would try to recover by
      jumping to the syscall exit code, but we'd run the syscall exit
      work even though we never made it to the syscall entry work.
      
      Clean it up by treating the failure path as a non-syscall entry
      and exit pair.
      
      This fixes strace's output when running the syscall_arg_fault
      test. Without this fix, strace would get out of sync and would
      fail to associate syscall entries with syscall exits.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/903010762c07a3d67df914fea2da84b52b0f8f1d.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5e99cb7c
    • Andy Lutomirski's avatar
      x86/entry, selftests/x86: Add a test for 32-bit fast syscall arg faults · 5e5c684a
      Andy Lutomirski authored
      This test passes on 4.0 and fails on some newer kernels.
      Fortunately, the failure is likely not a big deal.
      
      This test will make sure that we don't break it further (e.g. OOPSing)
      as we clean up the entry code and that we eventually fix the
      regression.
      
      There's arguably no need to preserve the old ABI here --
      anything that makes it into a fast (vDSO) syscall with a bad
      stack is about to crash no matter what we do.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/9cfcc51005168cb1b06b31991931214d770fc59a.1435952415.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5e5c684a
  2. 06 Jul, 2015 31 commits