1. 30 Nov, 2020 11 commits
    • Mark Rutland's avatar
      arm64: entry: fix EL1 debug transitions · 2a9b3e6a
      Mark Rutland authored
      In debug_exception_enter() and debug_exception_exit() we trace hardirqs
      on/off while RCU isn't guaranteed to be watching, and we don't save and
      restore the hardirq state, and so may return with this having changed.
      
      Handle this appropriately with new entry/exit helpers which do the bare
      minimum to ensure this is appropriately maintained, without marking
      debug exceptions as NMIs. These are placed in entry-common.c with the
      other entry/exit helpers.
      
      In future we'll want to reconsider whether some debug exceptions should
      be NMIs, but this will require a significant refactoring, and for now
      this should prevent issues with lockdep and RCU.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marins <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      2a9b3e6a
    • Mark Rutland's avatar
      arm64: entry: fix NMI {user, kernel}->kernel transitions · f0cd5ac1
      Mark Rutland authored
      Exceptions which can be taken at (almost) any time are consdiered to be
      NMIs. On arm64 that includes:
      
      * SDEI events
      * GICv3 Pseudo-NMIs
      * Kernel stack overflows
      * Unexpected/unhandled exceptions
      
      ... but currently debug exceptions (BRKs, breakpoints, watchpoints,
      single-step) are not considered NMIs.
      
      As these can be taken at any time, kernel features (lockdep, RCU,
      ftrace) may not be in a consistent kernel state. For example, we may
      take an NMI from the idle code or partway through an entry/exit path.
      
      While nmi_enter() and nmi_exit() handle most of this state, notably they
      don't save/restore the lockdep state across an NMI being taken and
      handled. When interrupts are enabled and an NMI is taken, lockdep may
      see interrupts become disabled within the NMI code, but not see
      interrupts become enabled when returning from the NMI, leaving lockdep
      believing interrupts are disabled when they are actually disabled.
      
      The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
      shortly be moved to the generic entry code. As we can't use either yet,
      we copy the x86 approach in arm64-specific helpers. All the NMI
      entrypoints are marked as noinstr to prevent any instrumentation
      handling code being invoked before the state has been corrected.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      f0cd5ac1
    • Mark Rutland's avatar
      arm64: entry: fix non-NMI kernel<->kernel transitions · 7cd1ea10
      Mark Rutland authored
      There are periods in kernel mode when RCU is not watching and/or the
      scheduler tick is disabled, but we can still take exceptions such as
      interrupts. The arm64 exception handlers do not account for this, and
      it's possible that RCU is not watching while an exception handler runs.
      
      The x86/generic entry code handles this by ensuring that all (non-NMI)
      kernel exception handlers call irqentry_enter() and irqentry_exit(),
      which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
      the generic entry code, and already hadnle the user<->kernel transitions
      elsewhere, so we add new kernel<->kernel transition helpers alog the
      lines of the generic entry code.
      
      Since we now track interrupts becoming masked when an exception is
      taken, local_daif_inherit() is modified to track interrupts becoming
      re-enabled when the original context is inherited. To balance the
      entry/exit paths, each handler masks all DAIF exceptions before
      exit_to_kernel_mode().
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      7cd1ea10
    • Mark Rutland's avatar
      arm64: ptrace: prepare for EL1 irq/rcu tracking · 1ec2f2c0
      Mark Rutland authored
      Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle
      sequences), or when the lockdep hardirqs transiently out-of-sync with
      the hardware state (e.g. in the middle of local_irq_enable()). To
      correctly handle these cases, we'll need to save/restore this state
      across some exceptions taken from EL1.
      
      A series of subsequent patches will update EL1 exception handlers to
      handle this. In preparation for this, and to avoid dependencies between
      those patches, this patch adds two new fields to struct pt_regs so that
      exception handlers can track this state.
      
      Note that this is placed in pt_regs as some entry/exit sequences such as
      el1_irq are invoked from assembly, which makes it very difficult to add
      a separate structure as with the irqentry_state used by x86. We can
      separate this once more of the exception logic is moved to C. While the
      fields only need to be bool, they are both made u64 to keep pt_regs
      16-byte aligned.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-9-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      1ec2f2c0
    • Mark Rutland's avatar
      arm64: entry: fix non-NMI user<->kernel transitions · 23529049
      Mark Rutland authored
      When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
      will WARN() at boot time that interrupts are enabled when we call
      context_tracking_user_enter(), despite the DAIF flags indicating that
      IRQs are masked.
      
      The problem is that we're not tracking IRQ flag changes accurately, and
      so lockdep believes interrupts are enabled when they are not (and
      vice-versa). We can shuffle things so to make this more accurate. For
      kernel->user transitions there are a number of constraints we need to
      consider:
      
      1) When we call __context_tracking_user_enter() HW IRQs must be disabled
         and lockdep must be up-to-date with this.
      
      2) Userspace should be treated as having IRQs enabled from the PoV of
         both lockdep and tracing.
      
      3) As context_tracking_user_enter() stops RCU from watching, we cannot
         use RCU after calling it.
      
      4) IRQ flag tracing and lockdep have state that must be manipulated
         before RCU is disabled.
      
      ... with similar constraints applying for user->kernel transitions, with
      the ordering reversed.
      
      The generic entry code has enter_from_user_mode() and
      exit_to_user_mode() helpers to handle this. We can't use those directly,
      so we add arm64 copies for now (without the instrumentation markers
      which aren't used on arm64). These replace the existing user_exit() and
      user_exit_irqoff() calls spread throughout handlers, and the exception
      unmasking is left as-is.
      
      Note that:
      
      * The accounting for debug exceptions from userspace now happens in
        el0_dbg() and ret_to_user(), so this is removed from
        debug_exception_enter() and debug_exception_exit(). As
        user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
      
      * The accounting for syscalls now happens in el0_svc(),
        el0_svc_compat(), and ret_to_user(), so this is removed from
        el0_svc_common(). This does not adversely affect the workaround for
        erratum 1463225, as this does not depend on any of the state tracking.
      
      * In ret_to_user() we mask interrupts with local_daif_mask(), and so we
        need to inform lockdep and tracing. Here a trace_hardirqs_off() is
        sufficient and safe as we have not yet exited kernel context and RCU
        is usable.
      
      * As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
        needs to check for the latter.
      
      * EL0 SError handling will be dealt with in a subsequent patch, as this
        needs to be treated as an NMI.
      
      Prior to this patch, booting an appropriately-configured kernel would
      result in spats as below:
      
      | DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
      | WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
      | Modules linked in:
      | CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
      | Hardware name: linux,dummy-virt (DT)
      | pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
      | pc : check_flags.part.54+0x1dc/0x1f0
      | lr : check_flags.part.54+0x1dc/0x1f0
      | sp : ffff80001003bd80
      | x29: ffff80001003bd80 x28: ffff66ce801e0000
      | x27: 00000000ffffffff x26: 00000000000003c0
      | x25: 0000000000000000 x24: ffffc31842527258
      | x23: ffffc31842491368 x22: ffffc3184282d000
      | x21: 0000000000000000 x20: 0000000000000001
      | x19: ffffc318432ce000 x18: 0080000000000000
      | x17: 0000000000000000 x16: ffffc31840f18a78
      | x15: 0000000000000001 x14: ffffc3184285c810
      | x13: 0000000000000001 x12: 0000000000000000
      | x11: ffffc318415857a0 x10: ffffc318406614c0
      | x9 : ffffc318415857a0 x8 : ffffc31841f1d000
      | x7 : 647261685f706564 x6 : ffffc3183ff7c66c
      | x5 : ffff66ce801e0000 x4 : 0000000000000000
      | x3 : ffffc3183fe00000 x2 : ffffc31841500000
      | x1 : e956dc24146b3500 x0 : 0000000000000000
      | Call trace:
      |  check_flags.part.54+0x1dc/0x1f0
      |  lock_is_held_type+0x10c/0x188
      |  rcu_read_lock_sched_held+0x70/0x98
      |  __context_tracking_enter+0x310/0x350
      |  context_tracking_enter.part.3+0x5c/0xc8
      |  context_tracking_user_enter+0x6c/0x80
      |  finish_ret_to_user+0x2c/0x13cr
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      23529049
    • Mark Rutland's avatar
      arm64: entry: move el1 irq/nmi logic to C · 105fc335
      Mark Rutland authored
      In preparation for reworking the EL1 irq/nmi entry code, move the
      existing logic to C. We no longer need the asm_nmi_enter() and
      asm_nmi_exit() wrappers, so these are removed. The new C functions are
      marked noinstr, which prevents compiler instrumentation and runtime
      probing.
      
      In subsequent patches we'll want the new C helpers to be called in all
      cases, so we don't bother wrapping the calls with ifdeferry. Even when
      the new C functions are stubs the trivial calls are unlikely to have a
      measurable impact on the IRQ or NMI paths anyway.
      
      Prototypes are added to <asm/exception.h> as otherwise (in some
      configurations) GCC will complain about the lack of a forward
      declaration. We already do this for existing function, e.g.
      enter_from_user_mode().
      
      The new helpers are marked as noinstr (which prevents all
      instrumentation, tracing, and kprobes). Otherwise, there should be no
      functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      105fc335
    • Mark Rutland's avatar
      arm64: entry: prepare ret_to_user for function call · 3cb5ed4d
      Mark Rutland authored
      In a subsequent patch ret_to_user will need to make a C function call
      (in some configurations) which may clobber x0-x18 at the start of the
      finish_ret_to_user block, before enable_step_tsk consumes the flags
      loaded into x1.
      
      In preparation for this, let's load the flags into x19, which is
      preserved across C function calls. This avoids a redundant reload of the
      flags and ensures we operate on a consistent shapshot regardless.
      
      There should be no functional change as a result of this patch. At this
      point of the entry/exit paths we only need to preserve x28 (tsk) and the
      sp, and x19 is free for this use.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      3cb5ed4d
    • Mark Rutland's avatar
      arm64: entry: move enter_from_user_mode to entry-common.c · 2f911d49
      Mark Rutland authored
      In later patches we'll want to extend enter_from_user_mode() and add a
      corresponding exit_to_user_mode(). As these will be common for all
      entries/exits from userspace, it'd be better for these to live in
      entry-common.c with the rest of the entry logic.
      
      This patch moves enter_from_user_mode() into entry-common.c. As with
      other functions in entry-common.c it is marked as noinstr (which
      prevents all instrumentation, tracing, and kprobes) but there are no
      other functional changes.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-5-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      2f911d49
    • Mark Rutland's avatar
      arm64: entry: mark entry code as noinstr · da192676
      Mark Rutland authored
      Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(),
      but they're still subject to other instrumentation which may rely on
      lockdep/rcu/context-tracking being up-to-date, and may cause nested
      exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt
      exceptions registers which have not yet been read.
      
      Prevent this by marking all functions in entry-common.c as noinstr to
      prevent compiler instrumentation. This also blacklists the functions for
      tracing and kprobes, so we don't need to handle that separately.
      Functions elsewhere will be dealt with in subsequent patches.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-4-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      da192676
    • Mark Rutland's avatar
      arm64: mark idle code as noinstr · 114e0a68
      Mark Rutland authored
      Core code disables RCU when calling arch_cpu_idle(), so it's not safe
      for arch_cpu_idle() or its calees to be instrumented, as the
      instrumentation callbacks may attempt to use RCU or other features which
      are unsafe to use in this context.
      
      Mark them noinstr to prevent issues.
      
      The use of local_irq_enable() in arch_cpu_idle() is similarly
      problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch
      queued in the tip tree addresses that case.
      Reported-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-3-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      114e0a68
    • Mark Rutland's avatar
      arm64: syscall: exit userspace before unmasking exceptions · ca1314d7
      Mark Rutland authored
      In el0_svc_common() we unmask exceptions before we call user_exit(), and
      so there's a window where an IRQ or debug exception can be taken while
      RCU is not watching. In do_debug_exception() we account for this in via
      debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we
      call trace functions which rely on RCU before we have a guarantee that
      RCU is watching.
      
      Let's avoid this by having el0_svc_common() exit userspace before
      unmasking exceptions, matching what we do for all other EL0 entry paths.
      We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ
      flags while we're sure exceptions are masked in DAIF.
      
      The workaround for Cortex-A76 erratum 1463225 may trigger a debug
      exception before this point, but the debug code invoked in this case is
      safe even when RCU is not watching.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201130115950.22492-2-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      ca1314d7
  2. 23 Nov, 2020 4 commits
  3. 13 Nov, 2020 5 commits
  4. 10 Nov, 2020 4 commits
  5. 05 Nov, 2020 1 commit
  6. 03 Nov, 2020 2 commits
  7. 30 Oct, 2020 2 commits
  8. 29 Oct, 2020 2 commits
  9. 28 Oct, 2020 9 commits