- 05 Jun, 2018 6 commits
-
-
Gautham R. Shenoy authored
The commit 78eaa10f ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Commit 2479bfc9 ("powerpc: Fix build by disabling attribute-alias warning for SYSCALL_DEFINEx") forgot arch/powerpc/kernel/pci_32.c Latest GCC version emit the following warnings As arch/powerpc code is built with -Werror, this breaks build with GCC 8.1 This patch inhibits this warning In file included from arch/powerpc/kernel/pci_32.c:14: ./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias between functions of incompatible types 'long int(long int, long unsigned int, long unsigned int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^~~~~~~~~~~~~~~~~ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Wei Yongjun authored
Add the missing unlock before return from function afu_ioctl_enable_p9_wait() in the error handling case. Fixes: e948e06f ("ocxl: Expose the thread_id needed for wait on POWER9") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Colin Ian King authored
Trivial fix to spelling mistake in hmi_error_types text Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Colin Ian King authored
Trivial fix to spelling mistake in bootx_printf message text Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Ram Pai authored
Disassociate the exec_key from a VMA if the VMA permission is not PROT_EXEC anymore. Otherwise the exec_only key continues to be associated with the vma, causing unexpected behavior. The problem was reported on x86 by Shakeel Butt, which is also applicable on powerpc. Fixes: 5586cf61 ("powerpc: introduce execute-only pkey") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 04 Jun, 2018 1 commit
-
-
Haren Myneni authored
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow) which is not related to paste request. The current paste function returns failure for a successful request when this bit is set. So mask this bit and check the proper return status. Fixes: 2392c8c8 ("powerpc/powernv/vas: Define copy/paste interfaces") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 03 Jun, 2018 33 commits
-
-
Mark Greer authored
There are no longer any platforms that use Marvell's mv64x60 hostbridges so remove the supporting kernel code. CC: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mark Greer authored
There are no longer any platforms that use Marvell's mv64x60 hostbridges so remove the supporting boot code. Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mark Greer authored
There are no longer any platforms that use Marvell's mv64x60's i2c controller so remove its driver. Signed-off-by: Mark Greer <mgreer@animalcreek.com> Acked-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Mark Greer <<a href="mailto:mgreer@animalcreek.com">mgreer@animalcreek.com</a>><br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mark Greer authored
There are no longer any platforms that use Marvell's MPSC serial controller so remove its driver. Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mark Greer authored
The C2K platform appears to be orphaned so remove code supporting it. CC: Remi Machet <rmachet@nvidia.com> Signed-off-by: Mark Greer <mgreer@animalcreek.com> Acked-by: Remi Machet <remi@machet.us> Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
At the time being, memcmp() compares two chunks of memory byte per byte. This patch optimises the comparison by comparing word by word. On the same way as commit 15c2d45d ("powerpc: Add 64bit optimised memcmp"), this patch moves memcmp() into a dedicated file named memcmp_32.S A small benchmark performed on an 8xx comparing two chuncks of 512 bytes performed 100000 times gives: Before : 5852274 TB ticks After: 1488638 TB ticks This is almost 4 times faster Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Rewrite clear_user() on the same principle as memset(0), making use of dcbz to clear complete cache lines. This code is a copy/paste of memset(), with some modifications in order to retrieve remaining number of bytes to be cleared, as it needs to be returned in case of error. On the same way as done on PPC64 in commit 17968fbb ("powerpc: 64bit optimised __clear_user"), the patch moves __clear_user() into a dedicated file string_32.S On a MPC885, throughput is almost doubled: Before: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s After: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s On a MPC8321, throughput is multiplied by 2.12: Before: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s After: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
arch_vtime_task_switch() is a small function which is called only from vtime_common_task_switch(), so it is worth inlining Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When compiled with GCC 8.1, vmlinux is significantly bigger than with GCC 4.8. When looking at the generated code with objdump, we notice that all functions and loops when a 16 bytes alignment. This significantly increases the size of the kernel. It is pointless and even counterproductive as on the 8xx 'nop' also consumes one clock cycle. Size of vmlinux with GCC 4.8: text data bss dec hex filename 5801948 1626076b 457796 7885820 7853fc vmlinux Size of vmlinux with GCC 8.1: text data bss dec hex filename 6764592 1630652 456476 8851720 871108 vmlinux Size of vmlinux with GCC 8.1 and this patch: text data bss dec hex filename 6331544 1631756 456476 8419776 8079c0 vmlinux Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The generic csum_ipv6_magic() generates a pretty bad result 00000000 <csum_ipv6_magic>: (PPC32) 0: 81 23 00 00 lwz r9,0(r3) 4: 81 03 00 04 lwz r8,4(r3) 8: 7c e7 4a 14 add r7,r7,r9 c: 7d 29 38 10 subfc r9,r9,r7 10: 7d 4a 51 10 subfe r10,r10,r10 14: 7d 27 42 14 add r9,r7,r8 18: 7d 2a 48 50 subf r9,r10,r9 1c: 80 e3 00 08 lwz r7,8(r3) 20: 7d 08 48 10 subfc r8,r8,r9 24: 7d 4a 51 10 subfe r10,r10,r10 28: 7d 29 3a 14 add r9,r9,r7 2c: 81 03 00 0c lwz r8,12(r3) 30: 7d 2a 48 50 subf r9,r10,r9 34: 7c e7 48 10 subfc r7,r7,r9 38: 7d 4a 51 10 subfe r10,r10,r10 3c: 7d 29 42 14 add r9,r9,r8 40: 7d 2a 48 50 subf r9,r10,r9 44: 80 e4 00 00 lwz r7,0(r4) 48: 7d 08 48 10 subfc r8,r8,r9 4c: 7d 4a 51 10 subfe r10,r10,r10 50: 7d 29 3a 14 add r9,r9,r7 54: 7d 2a 48 50 subf r9,r10,r9 58: 81 04 00 04 lwz r8,4(r4) 5c: 7c e7 48 10 subfc r7,r7,r9 60: 7d 4a 51 10 subfe r10,r10,r10 64: 7d 29 42 14 add r9,r9,r8 68: 7d 2a 48 50 subf r9,r10,r9 6c: 80 e4 00 08 lwz r7,8(r4) 70: 7d 08 48 10 subfc r8,r8,r9 74: 7d 4a 51 10 subfe r10,r10,r10 78: 7d 29 3a 14 add r9,r9,r7 7c: 7d 2a 48 50 subf r9,r10,r9 80: 81 04 00 0c lwz r8,12(r4) 84: 7c e7 48 10 subfc r7,r7,r9 88: 7d 4a 51 10 subfe r10,r10,r10 8c: 7d 29 42 14 add r9,r9,r8 90: 7d 2a 48 50 subf r9,r10,r9 94: 7d 08 48 10 subfc r8,r8,r9 98: 7d 4a 51 10 subfe r10,r10,r10 9c: 7d 29 2a 14 add r9,r9,r5 a0: 7d 2a 48 50 subf r9,r10,r9 a4: 7c a5 48 10 subfc r5,r5,r9 a8: 7c 63 19 10 subfe r3,r3,r3 ac: 7d 29 32 14 add r9,r9,r6 b0: 7d 23 48 50 subf r9,r3,r9 b4: 7c c6 48 10 subfc r6,r6,r9 b8: 7c 63 19 10 subfe r3,r3,r3 bc: 7c 63 48 50 subf r3,r3,r9 c0: 54 6a 80 3e rotlwi r10,r3,16 c4: 7c 63 52 14 add r3,r3,r10 c8: 7c 63 18 f8 not r3,r3 cc: 54 63 84 3e rlwinm r3,r3,16,16,31 d0: 4e 80 00 20 blr 0000000000000000 <.csum_ipv6_magic>: (PPC64) 0: 81 23 00 00 lwz r9,0(r3) 4: 80 03 00 04 lwz r0,4(r3) 8: 81 63 00 08 lwz r11,8(r3) c: 7c e7 4a 14 add r7,r7,r9 10: 7f 89 38 40 cmplw cr7,r9,r7 14: 7d 47 02 14 add r10,r7,r0 18: 7d 30 10 26 mfocrf r9,1 1c: 55 29 f7 fe rlwinm r9,r9,30,31,31 20: 7d 4a 4a 14 add r10,r10,r9 24: 7f 80 50 40 cmplw cr7,r0,r10 28: 7d 2a 5a 14 add r9,r10,r11 2c: 80 03 00 0c lwz r0,12(r3) 30: 81 44 00 00 lwz r10,0(r4) 34: 7d 10 10 26 mfocrf r8,1 38: 55 08 f7 fe rlwinm r8,r8,30,31,31 3c: 7d 29 42 14 add r9,r9,r8 40: 81 04 00 04 lwz r8,4(r4) 44: 7f 8b 48 40 cmplw cr7,r11,r9 48: 7d 29 02 14 add r9,r9,r0 4c: 7d 70 10 26 mfocrf r11,1 50: 55 6b f7 fe rlwinm r11,r11,30,31,31 54: 7d 29 5a 14 add r9,r9,r11 58: 7f 80 48 40 cmplw cr7,r0,r9 5c: 7d 29 52 14 add r9,r9,r10 60: 7c 10 10 26 mfocrf r0,1 64: 54 00 f7 fe rlwinm r0,r0,30,31,31 68: 7d 69 02 14 add r11,r9,r0 6c: 7f 8a 58 40 cmplw cr7,r10,r11 70: 7c 0b 42 14 add r0,r11,r8 74: 81 44 00 08 lwz r10,8(r4) 78: 7c f0 10 26 mfocrf r7,1 7c: 54 e7 f7 fe rlwinm r7,r7,30,31,31 80: 7c 00 3a 14 add r0,r0,r7 84: 7f 88 00 40 cmplw cr7,r8,r0 88: 7d 20 52 14 add r9,r0,r10 8c: 80 04 00 0c lwz r0,12(r4) 90: 7d 70 10 26 mfocrf r11,1 94: 55 6b f7 fe rlwinm r11,r11,30,31,31 98: 7d 29 5a 14 add r9,r9,r11 9c: 7f 8a 48 40 cmplw cr7,r10,r9 a0: 7d 29 02 14 add r9,r9,r0 a4: 7d 70 10 26 mfocrf r11,1 a8: 55 6b f7 fe rlwinm r11,r11,30,31,31 ac: 7d 29 5a 14 add r9,r9,r11 b0: 7f 80 48 40 cmplw cr7,r0,r9 b4: 7d 29 2a 14 add r9,r9,r5 b8: 7c 10 10 26 mfocrf r0,1 bc: 54 00 f7 fe rlwinm r0,r0,30,31,31 c0: 7d 29 02 14 add r9,r9,r0 c4: 7f 85 48 40 cmplw cr7,r5,r9 c8: 7c 09 32 14 add r0,r9,r6 cc: 7d 50 10 26 mfocrf r10,1 d0: 55 4a f7 fe rlwinm r10,r10,30,31,31 d4: 7c 00 52 14 add r0,r0,r10 d8: 7f 80 30 40 cmplw cr7,r0,r6 dc: 7d 30 10 26 mfocrf r9,1 e0: 55 29 ef fe rlwinm r9,r9,29,31,31 e4: 7c 09 02 14 add r0,r9,r0 e8: 54 03 80 3e rotlwi r3,r0,16 ec: 7c 03 02 14 add r0,r3,r0 f0: 7c 03 00 f8 not r3,r0 f4: 78 63 84 22 rldicl r3,r3,48,48 f8: 4e 80 00 20 blr This patch implements it in assembly for both PPC32 and PPC64 Link: https://github.com/linuxppc/linux/issues/9Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Improve __csum_partial by interleaving loads and adds. On a 8xx, it brings neither improvement nor degradation. On a 83xx, it brings a 25% improvement. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
commit 87a156fb ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Use fault_in_pages_readable() to prefault user context instead of open coding Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The 885 familly processors don't have the Real Time Clock Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Variable div is set but never used. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
reloc_offset() is the same as add_reloc_offset(0) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The current implementation of from64to32() gives a poor result: 0000000000000270 <.from64to32>: 270: 38 00 ff ff li r0,-1 274: 78 69 00 22 rldicl r9,r3,32,32 278: 78 00 00 20 clrldi r0,r0,32 27c: 7c 60 00 38 and r0,r3,r0 280: 7c 09 02 14 add r0,r9,r0 284: 78 09 00 22 rldicl r9,r0,32,32 288: 7c 00 4a 14 add r0,r0,r9 28c: 78 03 00 20 clrldi r3,r0,32 290: 4e 80 00 20 blr This patch modifies from64to32() to operate in the same spirit as csum_fold() It swaps the two 32-bit halves of sum then it adds it with the unswapped sum. If there is a carry from adding the two 32-bit halves, it will carry from the lower half into the upper half, giving us the correct sum in the upper half. The resulting code is: 0000000000000260 <.from64to32>: 260: 78 60 00 02 rotldi r0,r3,32 264: 7c 60 1a 14 add r3,r0,r3 268: 78 63 00 22 rldicl r3,r3,32,32 26c: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
stale_map[] bits are only set in steal_context_smp() so on UP processors this map is useless. Only manage it for SMP processors. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
last_context is 16 on the 8xx, 65535 on the 47x and 255 on other ones. The kernel is exclusively built for the 8xx, for the 47x or for another processor so the last context can be defined as a constant depending on the processor. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Reformat old comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
no_selective_tlbil hence the use of either steal_all_contexts() or steal_context_up() depends on the subarch, it won't change during run. Only the 8xx uses steal_all_contexts and CONFIG_PPC_8xx is exclusive of other processors. This patch replaces the test of no_selective_tlbil global var by a test of CONFIG_PPC_8xx selection. It avoids the test and removes unnecessary code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
First context is now 1 for all supported platforms, so it can be made a constant. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Direction is already checked in all calling functions in include/linux/dma-mapping.h and also in called function __dma_sync() So really no need to check it once more here. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Ravi Bangoria authored
emulate_step() tests are failing if VSX is not supported or disabled. emulate_step_test: lxvd2x : FAIL emulate_step_test: stxvd2x : FAIL If !CPU_FTR_VSX, emulate_step() failure is expected and testcase should PASS with a valid justification. After patch: emulate_step_test: lxvd2x : PASS (!CPU_FTR_VSX) emulate_step_test: stxvd2x : PASS (!CPU_FTR_VSX) Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Ravi Bangoria authored
emulate_step() is not checking runtime VSX feature flag before emulating an instruction. This is causing kernel crash when kernel is compiled with CONFIG_VSX=y but running on a machine where VSX is not supported or disabled. Ex, while running emulate_step tests on P6 machine: Oops: Exception in kernel mode, sig: 4 [#1] NIP [c000000000095c24] .load_vsrn+0x28/0x54 LR [c000000000094bdc] .emulate_loadstore+0x167c/0x17b0 Call Trace: 0x40fe240c7ae147ae (unreliable) .emulate_loadstore+0x167c/0x17b0 .emulate_step+0x25c/0x5bc .test_lxvd2x_stxvd2x+0x64/0x154 .test_emulate_step+0x38/0x4c .do_one_initcall+0x5c/0x2c0 .kernel_init_freeable+0x314/0x4cc .kernel_init+0x24/0x160 .ret_from_kernel_thread+0x58/0xb4 With fix: emulate_step_test: lxvd2x : FAIL emulate_step_test: stxvd2x : FAIL Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Ravi Bangoria authored
Replace 'op->type & INSTR_TYPE_MASK' expression with GETTYPE(op->type) macro. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on powerpc. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michal Suchanek authored
We now have barrier_nospec as mitigation so print it in cpu_show_spectre_v1() when enabled. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Our syscall entry is done in assembly so patch in an explicit barrier_nospec. Based on a patch by Michal Suchanek. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Based on the x86 commit doing the same. See commit 304ec1b0 ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec") and b3bbfb3f ("x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec") for more detail. In all cases we are ordering the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar. Base on a patch from Michal Suchanek. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michal Suchanek authored
Check what firmware told us and enable/disable the barrier_nospec as appropriate. We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michal Suchanek authored
Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded. Iterating all modules and re-patching every time the settings change is not implemented. Based on lwsync patching. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michal Suchanek authored
Based on the RFI patching. This is required to be able to disable the speculation barrier. Only one barrier type is supported and it does nothing when the firmware does not enable it. Also re-patching modules is not supported So the only meaningful thing that can be done is patching out the speculation barrier at boot when the user says it is not wanted. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michal Suchanek authored
A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op. Implement barrier_nospec using this instruction. mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-