An error occurred fetching the project authors.
- 03 Jul, 2019 1 commit
-
-
Nicholas Piggin authored
The idle wake up code in the system reset interrupt is not very optimal. There are two requirements: perform idle wake up quickly; and save everything including CFAR for non-idle interrupts, with no performance requirement. The problem with placing the idle test in the middle of the handler and using the normal handler code to save CFAR, is that it's quite costly (e.g., mfcfar is serialising, speculative workarounds get applied, SRR1 has to be reloaded, etc). It also prevents the standard interrupt handler boilerplate being used. This pain can be avoided by using a dedicated idle interrupt handler at the start of the interrupt handler, which restores all registers back to the way they were in case it was not an idle wake up. CFAR is preserved without saving it before the non-idle case by making that the fall-through, and idle is a taken branch. Performance seems to be in the noise, but possibly around 0.5% faster, the executed instructions certainly look better. The bigger benefit is being able to drop in standard interrupt handlers after the idle code, which helps with subsequent cleanup and consolidation. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup BE by using DOTSYM for idle_return_gpr_loss call] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 02 Jul, 2019 34 commits
-
-
Nicholas Piggin authored
The bad stack test in interrupt handlers has a few problems. For performance it is taken in the common case, which is a fetch bubble and a waste of i-cache. For code development and maintainence, it requires yet another stack frame setup routine, and that constrains all exception handlers to follow the same register save pattern which inhibits future optimisation. Remove the test/branch and replace it with a trap. Teach the program check handler to use the emergency stack for this case. This does not result in quite so nice a message, however the SRR0 and SRR1 of the crashed interrupt can be seen in r11 and r12, as is the original r1 (adjusted by INT_FRAME_SIZE). These are the most important parts to debugging the issue. The original r9-12 and cr0 is lost, which is the main downside. kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847! Oops: Exception in kernel mode, sig: 5 [#1] BE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted NIP: c000000000009108 LR: c000000000cadbcc CTR: c0000000000090f0 REGS: c0000000fffcbd70 TRAP: 0700 Not tainted MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 28222448 XER: 20040000 CFAR: c000000000009100 IRQMASK: 0 GPR00: 000000000000003d fffffffffffffd00 c0000000018cfb00 c0000000f02b3166 GPR04: fffffffffffffffd 0000000000000007 fffffffffffffffb 0000000000000030 GPR08: 0000000000000037 0000000028222448 0000000000000000 c000000000ca8de0 GPR12: 9000000002009032 c000000001ae0000 c000000000010a00 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c0000000f00322c0 c000000000f85200 0000000000000004 ffffffffffffffff GPR24: fffffffffffffffe 0000000000000000 0000000000000000 000000000000000a GPR28: 0000000000000000 0000000000000000 c0000000f02b391c c0000000f02b3167 NIP [c000000000009108] decrementer_common+0x18/0x160 LR [c000000000cadbcc] .vsnprintf+0x3ec/0x4f0 Call Trace: Instruction dump: 996d098a 994d098b 38610070 480246ed 48005518 60000000 38200000 718a4000 7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178 Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Although the 0x1500 interrupt only applies to bare metal, it is better to just use the standard macro for scratch save. Runtime code path remains unchanged (due to instruction patching). Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Some exception entry requires DAR and/or DSISR to be saved into the paca exception save area. Add options to the standard exception macros for these. Generated code changes slightly due to code structure. - 554: a6 02 72 7d mfdsisr r11 - 558: a8 00 4d f9 std r10,168(r13) - 55c: b0 00 6d 91 stw r11,176(r13) + 554: a8 00 4d f9 std r10,168(r13) + 558: a6 02 52 7d mfdsisr r10 + 55c: b0 00 4d 91 stw r10,176(r13) Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Restore all SPRs and CR up-front, these are longer latency instructions. Move register restore around to maximise pairs of adjacent loads (e.g., restore r0 next to r1). Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Duplicate the hmi windup code for both cases, rather than to put a special case branch in the middle of it. Remove unused label. This helps with later code consolidation. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Move in_mce decrement earlier before registers are restored (but still after RI=0). This helps with later consolidation. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Trivial code change, r3->r9. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
All supported 64s CPUs support mtmsrd L=1 instruction, so a cleanup can be made in sreset and mce handlers. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Move SPR reads ahead of writes. Real mode entry that is not a KVM guest is rare these days, but bad practice propagates. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
syscall / hcall entry unnecessarily differs between KVM and non-KVM builds. Move the SMT priority instruction to the same location (after INTERRUPT_TO_KERNEL). Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Final vmlinux is changed only due to change in bug table line numbers. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Generally, macros that result in instructions being expanded are indented by a tab, and those that don't have no indent. Fix the obvious cases that go contrary to style. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
After the previous cleanup, it becomes possible to consolidate some common code outside the runtime alternate patching. Also remove unused labels. This results in some code change, but unchanged runtime instruction sequence. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Many of these macros just specify 1-4 lines which are only called a few times each at most, and often just once. Remove this indirection. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
More cases of code insertion via macros that does not add a great deal. All the additions have to be specified in the macro arguments, so they can just as well go after the macro. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The aim is to reduce the amount of indirection it takes to get through the exception handler macros, particularly where it provides little code sharing. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Conditionally expand the skip case if it is specified. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Conditionally expand the soft-masking test if a mask is passed in. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Rather than pass in the soft-masking and KVM tests via macro that is passed to another macro to expand it, switch to usig gas macros and conditionally expand the soft-masking and KVM tests. The system reset with its idle test is open coded as it is a one-off. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The sreset handler KVM test theoretically should not depend on P7. In practice KVM now only supports P7 and up so no real bug fix, but this change is made now so the quirk is not propagated through cleanup patches. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
- Re-name the macros to _REAL and _VIRT suffixes rather than no and _RELON suffix. - Move the macro definitions together in the file. - Move RELOCATABLE ifdef inside the _VIRT macro. Further consolidation between variants does not buy much here. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Switch to a gas macro that conditionally expands the RI clearing instruction. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Replace all instances of this with gas macros that test the hsrr parameter and use the appropriate register names / labels. No generated code change. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> [mpe: Remove extraneous 2nd check for 0xea0 in SOFTEN_TEST] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 25 Jun, 2019 1 commit
-
-
Nicholas Piggin authored
The early machine check runs in real mode, so locking is unnecessary. Worse, the windup does not restore AMR, so this can result in a false KUAP fault after a recoverable machine check hits inside a user copy operation. Fix this similarly to HMI by just avoiding the kuap lock in the early machine check handler (it will be set by the late handler that runs in virtual mode if that runs). If the virtual mode handler is reached, it will lock and restore the AMR. Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU") Cc: Russell Currey <ruscur@russell.cc> Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 19 Jun, 2019 1 commit
-
-
Ravi Bangoria authored
powerpc hardware triggers watchpoint before executing the instruction. To make trigger-after-execute behavior, kernel emulates the instruction. If the instruction is 'load something into non-volatile register', exception handler should restore emulated register state while returning back, otherwise there will be register state corruption. eg, adding a watchpoint on a list can corrput the list: # cat /proc/kallsyms | grep kthread_create_list c00000000121c8b8 d kthread_create_list Add watchpoint on kthread_create_list->prev: # perf record -e mem:0xc00000000121c8c0 Run some workload such that new kthread gets invoked. eg, I just logged out from console: list_add corruption. next->prev should be prev (c000000001214e00), \ but was c00000000121c8b8. (next=c00000000121c8b8). WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0 CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69 ... NIP __list_add_valid+0xb4/0xc0 LR __list_add_valid+0xb0/0xc0 Call Trace: __list_add_valid+0xb0/0xc0 (unreliable) __kthread_create_on_node+0xe0/0x260 kthread_create_on_node+0x34/0x50 create_worker+0xe8/0x260 worker_thread+0x444/0x560 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x70 List corruption happened because it uses 'load into non-volatile register' instruction: Snippet from __kthread_create_on_node: c000000000136be8: addis r29,r2,-19 c000000000136bec: ld r29,31424(r29) if (!__list_add_valid(new, prev, next)) c000000000136bf0: mr r3,r30 c000000000136bf4: mr r5,r28 c000000000136bf8: mr r4,r29 c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8> Register state from WARN_ON(): GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075 GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000 GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00 GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370 GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000 GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628 GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0 Watchpoint hit at 0xc000000000136bec. addis r29,r2,-19 => r29 = 0xc000000001344e00 + (-19 << 16) => r29 = 0xc000000001214e00 ld r29,31424(r29) => r29 = *(0xc000000001214e00 + 31424) => r29 = *(0xc00000000121c8c0) 0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was emulated by emulate_step. But because handle_dabr_fault did not restore emulated register state, r29 still contains stale value in above register state. Fixes: 5aae8a53 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") Signed-off-by:
Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 30 Apr, 2019 1 commit
-
-
Nicholas Piggin authored
Reimplement Book3S idle code in C, moving POWER7/8/9 implementation speific HV idle code to the powernv platform code. Book3S assembly stubs are kept in common code and used only to save the stack frame and non-volatile GPRs before executing architected idle instructions, and restoring the stack and reloading GPRs then returning to C after waking from idle. The complex logic dealing with threads and subcores, locking, SPRs, HMIs, timebase resync, etc., is all done in C which makes it more maintainable. This is not a strict translation to C code, there are some significant differences: - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs, but saves and restores them itself. - The optimisation where EC=ESL=0 idle modes did not have to save GPRs or change MSR is restored, because it's now simple to do. ESL=1 sleeps that do not lose GPRs can use this optimization too. - KVM secondary entry and cede is now more of a call/return style rather than branchy. nap_state_lost is not required because KVM always returns via NVGPR restoring path. - KVM secondary wakeup from offline sequence is moved entirely into the offline wakeup, which avoids a hwsync in the normal idle wakeup path. Performance measured with context switch ping-pong on different threads or cores, is possibly improved a small amount, 1-3% depending on stop state and core vs thread test for shallow states. Deep states it's in the noise compared with other latencies. KVM improvements: - Idle sleepers now always return to caller rather than branch out to KVM first. - This allows optimisations like very fast return to caller when no state has been lost. - KVM no longer requires nap_state_lost because it controls NVGPR save/restore itself on the way in and out. - The heavy idle wakeup KVM request check can be moved out of the normal host idle code and into the not-performance-critical offline code. - KVM nap code now returns from where it is called, which makes the flow a bit easier to follow. Reviewed-by:
Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> [mpe: Squash the KVM changes in] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 21 Apr, 2019 1 commit
-
-
Michael Ellerman authored
Kernel Userspace Access Prevention utilises a feature of the Radix MMU which disallows read and write access to userspace addresses. By utilising this, the kernel is prevented from accessing user data from outside of trusted paths that perform proper safety checks, such as copy_{to/from}_user() and friends. Userspace access is disabled from early boot and is only enabled when performing an operation like copy_{to/from}_user(). The register that controls this (AMR) does not prevent userspace from accessing itself, so there is no need to save and restore when entering and exiting userspace. When entering the kernel from the kernel we save AMR and if it is not blocking user access (because eg. we faulted doing a user access) we reblock user access for the duration of the exception (ie. the page fault) and then restore the AMR when returning back to the kernel. This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and performing the following: # (echo ACCESS_USERSPACE) > [debugfs]/provoke-crash/DIRECT If enabled, this should send SIGSEGV to the thread. We also add paranoid checking of AMR in switch and syscall return under CONFIG_PPC_KUAP_DEBUG. Co-authored-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Russell Currey <ruscur@russell.cc> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- 08 Apr, 2019 1 commit
-
-
Nicholas Piggin authored
Commit 48e7b769 ("powerpc/64s/hash: Convert SLB miss handlers to C") broke the radix-mode segment exception handler. In radix mode, this is exception is not an SLB miss, rather it signals that the EA is outside the range translated by any page table. The commit lost the radix feature alternate code patch, which can cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639! The original radix code would send faults to slb_miss_large_addr, which would end up faulting due to slb_addr_limit being 0. This patch sends radix directly to do_bad_slb_fault, which is a bit clearer. Fixes: 48e7b769 ("powerpc/64s/hash: Convert SLB miss handlers to C") Cc: stable@vger.kernel.org # v4.20+ Reported-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Reviewed-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-