An error occurred fetching the project authors.
  1. 03 Jul, 2019 1 commit
    • Nicholas Piggin's avatar
      powerpc/64s/exception: optimise system_reset for idle, clean up non-idle case · 0e10be2b
      Nicholas Piggin authored
      The idle wake up code in the system reset interrupt is not very
      optimal. There are two requirements: perform idle wake up quickly;
      and save everything including CFAR for non-idle interrupts, with
      no performance requirement.
      
      The problem with placing the idle test in the middle of the handler
      and using the normal handler code to save CFAR, is that it's quite
      costly (e.g., mfcfar is serialising, speculative workarounds get
      applied, SRR1 has to be reloaded, etc). It also prevents the standard
      interrupt handler boilerplate being used.
      
      This pain can be avoided by using a dedicated idle interrupt handler
      at the start of the interrupt handler, which restores all registers
      back to the way they were in case it was not an idle wake up. CFAR
      is preserved without saving it before the non-idle case by making that
      the fall-through, and idle is a taken branch.
      
      Performance seems to be in the noise, but possibly around 0.5% faster,
      the executed instructions certainly look better. The bigger benefit is
      being able to drop in standard interrupt handlers after the idle code,
      which helps with subsequent cleanup and consolidation.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Fixup BE by using DOTSYM for idle_return_gpr_loss call]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0e10be2b
  2. 02 Jul, 2019 34 commits
  3. 25 Jun, 2019 1 commit
    • Nicholas Piggin's avatar
      powerpc/64s/exception: Fix machine check early corrupting AMR · e13e7cd4
      Nicholas Piggin authored
      The early machine check runs in real mode, so locking is unnecessary.
      Worse, the windup does not restore AMR, so this can result in a false
      KUAP fault after a recoverable machine check hits inside a user copy
      operation.
      
      Fix this similarly to HMI by just avoiding the kuap lock in the
      early machine check handler (it will be set by the late handler that
      runs in virtual mode if that runs). If the virtual mode handler is
      reached, it will lock and restore the AMR.
      
      Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
      Cc: Russell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e13e7cd4
  4. 19 Jun, 2019 1 commit
    • Ravi Bangoria's avatar
      powerpc/watchpoint: Restore NV GPRs while returning from exception · f474c28f
      Ravi Bangoria authored
      powerpc hardware triggers watchpoint before executing the instruction.
      To make trigger-after-execute behavior, kernel emulates the
      instruction. If the instruction is 'load something into non-volatile
      register', exception handler should restore emulated register state
      while returning back, otherwise there will be register state
      corruption. eg, adding a watchpoint on a list can corrput the list:
      
        # cat /proc/kallsyms | grep kthread_create_list
        c00000000121c8b8 d kthread_create_list
      
      Add watchpoint on kthread_create_list->prev:
      
        # perf record -e mem:0xc00000000121c8c0
      
      Run some workload such that new kthread gets invoked. eg, I just
      logged out from console:
      
        list_add corruption. next->prev should be prev (c000000001214e00), \
      	but was c00000000121c8b8. (next=c00000000121c8b8).
        WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
        CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
        ...
        NIP __list_add_valid+0xb4/0xc0
        LR __list_add_valid+0xb0/0xc0
        Call Trace:
        __list_add_valid+0xb0/0xc0 (unreliable)
        __kthread_create_on_node+0xe0/0x260
        kthread_create_on_node+0x34/0x50
        create_worker+0xe8/0x260
        worker_thread+0x444/0x560
        kthread+0x160/0x1a0
        ret_from_kernel_thread+0x5c/0x70
      
      List corruption happened because it uses 'load into non-volatile
      register' instruction:
      
      Snippet from __kthread_create_on_node:
      
        c000000000136be8:     addis   r29,r2,-19
        c000000000136bec:     ld      r29,31424(r29)
              if (!__list_add_valid(new, prev, next))
        c000000000136bf0:     mr      r3,r30
        c000000000136bf4:     mr      r5,r28
        c000000000136bf8:     mr      r4,r29
        c000000000136bfc:     bl      c00000000059a2f8 <__list_add_valid+0x8>
      
      Register state from WARN_ON():
      
        GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
        GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
        GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
        GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
        GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
        GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
        GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
        GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
      
      Watchpoint hit at 0xc000000000136bec.
      
        addis   r29,r2,-19
         => r29 = 0xc000000001344e00 + (-19 << 16)
         => r29 = 0xc000000001214e00
      
        ld      r29,31424(r29)
         => r29 = *(0xc000000001214e00 + 31424)
         => r29 = *(0xc00000000121c8c0)
      
      0xc00000000121c8c0 is where we placed a watchpoint and thus this
      instruction was emulated by emulate_step. But because handle_dabr_fault
      did not restore emulated register state, r29 still contains stale
      value in above register state.
      
      Fixes: 5aae8a53 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: stable@vger.kernel.org # 2.6.36+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f474c28f
  5. 30 Apr, 2019 1 commit
    • Nicholas Piggin's avatar
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin authored
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      10d91611
  6. 21 Apr, 2019 1 commit
    • Michael Ellerman's avatar
      powerpc/64s: Implement KUAP for Radix MMU · 890274c2
      Michael Ellerman authored
      Kernel Userspace Access Prevention utilises a feature of the Radix MMU
      which disallows read and write access to userspace addresses. By
      utilising this, the kernel is prevented from accessing user data from
      outside of trusted paths that perform proper safety checks, such as
      copy_{to/from}_user() and friends.
      
      Userspace access is disabled from early boot and is only enabled when
      performing an operation like copy_{to/from}_user(). The register that
      controls this (AMR) does not prevent userspace from accessing itself,
      so there is no need to save and restore when entering and exiting
      userspace.
      
      When entering the kernel from the kernel we save AMR and if it is not
      blocking user access (because eg. we faulted doing a user access) we
      reblock user access for the duration of the exception (ie. the page
      fault) and then restore the AMR when returning back to the kernel.
      
      This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y)
      and performing the following:
      
        # (echo ACCESS_USERSPACE) > [debugfs]/provoke-crash/DIRECT
      
      If enabled, this should send SIGSEGV to the thread.
      
      We also add paranoid checking of AMR in switch and syscall return
      under CONFIG_PPC_KUAP_DEBUG.
      Co-authored-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      890274c2
  7. 08 Apr, 2019 1 commit