- 16 Jul, 2020 24 commits
-
-
Nathan Lynch authored
Previous changes have made it so these flags are never changed; enforce this by making them const. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-6-nathanl@linux.ibm.com
-
Nathan Lynch authored
Since the topology_updates_enabled flag is now always false, remove it and the code which has become unreachable. This is the minimum change that prevents 'defined but unused' warnings emitted by the compiler after stubbing out the start/stop_topology_updates() functions. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-5-nathanl@linux.ibm.com
-
Nathan Lynch authored
Remove the /proc/powerpc/topology_updates interface and the topology_updates=on/off command line argument. The internal topology_updates_enabled flag remains for now, but always false. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-4-nathanl@linux.ibm.com
-
Nathan Lynch authored
Partition suspension, used for hibernation and migration, requires that the OS place all but one of the LPAR's processor threads into one of two states prior to calling the ibm,suspend-me RTAS function: * the architected offline state (via RTAS stop-self); or * the H_JOIN hcall, which does not return until the partition resumes execution Using H_CEDE as the offline mode, introduced by commit 3aa565f5 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), means that any threads which are offline from Linux's point of view must be moved to one of those two states before a partition suspension can proceed. This was eventually addressed in commit 120496ac ("powerpc: Bring all threads online prior to migration/hibernation"), which added code to temporarily bring up any offline processor threads so they can call H_JOIN. Conceptually this is fine, but the implementation has had multiple races with cpu hotplug operations initiated from user space[1][2][3], the error handling is fragile, and it generates user-visible cpu hotplug events which is a lot of noise for a platform feature that's supposed to minimize disruption to workloads. With commit 3aa565f5 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") reverted, this code becomes unnecessary, so remove it. Since any offline CPUs now are truly offline from the platform's point of view, it is no longer necessary to bring up CPUs only to have them call H_JOIN and then go offline again upon resuming. Only active threads are required to call H_JOIN; stopped threads can be left alone. [1] commit a6717c01 ("powerpc/rtas: use device model APIs and serialization during LPM") [2] commit 9fb60305 ("powerpc/rtas: retry when cpu offline races with suspend/migration") [3] commit dfd718a2 ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Fixes: 120496ac ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com
-
Nathan Lynch authored
This effectively reverts commit 3aa565f5 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), which added an offline mode for CPUs which uses the H_CEDE hcall instead of the architected stop-self RTAS function in order to facilitate "folding" of dedicated mode processors on PowerVM platforms to achieve energy savings. This has been the default offline mode since its introduction. There's nothing about stop-self that would prevent the hypervisor from achieving the energy savings available via H_CEDE, so the original premise of this change appears to be flawed. I also have encountered the claim that the transition to and from ceded state is much faster than stop-self/start-cpu. Certainly we would not want to use stop-self as an *idle* mode. That is what H_CEDE is for. However, this difference is insignificant in the context of Linux CPU hotplug, where the latency of an offline or online operation on current systems is on the order of 100ms, mainly attributable to all the various subsystems' cpuhp callbacks. The cede offline mode also prevents accurate accounting, as discussed before: https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-ego@linux.vnet.ibm.com/ Unconditionally use stop-self to offline processor threads. This is the architected method for offlining CPUs on PAPR systems. The "cede_offline" boot parameter is rendered obsolete. Removing this code enables the removal of the partition suspend code which temporarily onlines all present CPUs. Fixes: 3aa565f5 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-2-nathanl@linux.ibm.com
-
Nicholas Piggin authored
If both count cache and link stack are to be flushed, and can be flushed with the special bcctr, patch that in directly to the flush/branch nop site. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-7-npiggin@gmail.com
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
-
Nicholas Piggin authored
Branch cache flushing code patching has inter-dependencies on both the link stack and the count cache flushing state. To make the code clearer and to separate the link stack and count cache handling, split the "toggle" (setting up variables and printing enable/disable) from the code patching. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Always print something, even if the flush is disabled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-5-npiggin@gmail.com
-
Nicholas Piggin authored
Make the count-cache and link-stack messages look the same Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-4-npiggin@gmail.com
-
Nicholas Piggin authored
Prepare to allow for hardware link stack flushing by using the none/sw/hw type, same as the count cache state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-3-npiggin@gmail.com
-
Nicholas Piggin authored
The count cache flush mostly refers to both count cache and link stack flushing. As a first step to untangling these a bit, re-name the bits that apply to both. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-2-npiggin@gmail.com
-
Nicholas Piggin authored
When a FP/VEC/VSX unavailable fault loads registers and enables the facility in the MSR, re-set the lazy restore counters to 1 rather than incrementing them so every fault gets the same number of restores before the next fault. This probably shouldn't be a practical change because if a lazy counter was non-zero then it should have been restored and would not cause a fault when userspace tries to access it. However the code and comment implies otherwise so that's misleading and unnecessary. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-3-npiggin@gmail.com
-
Nicholas Piggin authored
Before returning to user, if there are missing FP/VEC/VSX bits from the user MSR then those registers had been saved and must be restored again before use. restore_math will decide whether to restore immediately, or skip the restore and let fp/vec/vsx unavailable faults demand load the registers. Each time restore_math restores one of the FP/VSX or VEC register sets is loaded, an 8-bit counter is incremented (load_fp and load_vec). When these wrap to zero, restore_math no longer restores that register set until after they are next demand faulted. It's quite usual for those counters to have different values, so if one wraps to zero and restore_math no longer restores its registers or user MSR bit but the other is not zero yet does not need to be restored (because the kernel is not frequently using the FPU), then restore_math will be called and it will also not return in the early exit check. This causes msr_check_and_set to test and set the MSR at every kernel exit despite having no work to do. This can cause workloads (e.g., a NULL syscall microbenchmark) to run fast for a time while both counters are non-zero, then slow down when one of the counters reaches zero, then speed up again after the second counter reaches zero. The cost is significant, about 10% slowdown on a NULL syscall benchmark, and the jittery behaviour is very undesirable. Fix this by having restore_math test all conditions first, and only update MSR if we will be loading registers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
-
Nicholas Piggin authored
The TM test in restore_math added by commit dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") is no longer necessary after commit a8318c13 ("powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts"), which removed the cases where restore_math has to restore if TM is active. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
-
Aneesh Kumar K.V authored
With kernel now supporting new pmem flush/sync instructions, we can now enable the kernel to initialize the device. On P10 these devices would appear with a new compatible string. For PAPR device we have compatible "ibm,pmemory-v2" and for OF pmem device we have compatible "pmem-region-v2" Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-8-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
nvdimm expect the flush routines to just mark the cache clean. The barrier that mark the store globally visible is done in nvdimm_flush(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-7-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
pmem on POWER10 can now use phwsync instead of hwsync to ensure all previous writes are architecturally visible for the platform buffer flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-6-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
Architectures like ppc64 provide persistent memory specific barriers that will ensure that all stores for which the modifications are written to persistent storage by preceding dcbfps and dcbstps instructions have updated persistent storage before any data access or data transfer caused by subsequent instructions is initiated. This is in addition to the ordering done by wmb() Update nvdimm core such that architecture can use barriers other than wmb to ensure all previous writes are architecturally visible for the platform buffer flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
Start using dcbstps; phwsync; sequence for flushing persistent memory range. The new instructions are implemented as a variant of dcbf and hwsync and on P8 and P9 they will be executed as those instructions. We avoid using them on older hardware. This helps to avoid difficult to debug bugs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-4-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps) that can be used to write modified locations back to persistent storage. Additionally, POWER10 also introduce phwsync and plwsync which can be used to establish order of these writes to persistent storage. This patch exposes these instructions to the rest of the kernel. The existing dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions are added as a variant of the old ones that old hardware won't differentiate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-3-aneesh.kumar@linux.ibm.com
-
Aneesh Kumar K.V authored
The PAPR based virtualized persistent memory devices are only supported on POWER9 and above. In the followup patch, the kernel will switch the persistent memory cache flush functions to use a new `dcbf` variant instruction. The new instructions even though added in ISA 3.1 works even on P8 and P9 because these are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and P9 behaves as such. Considering these devices are only supported on P8 and above, update the driver to prevent a P7-compat guest from using persistent memory devices. We don't update of_pmem driver with the same condition, because, on bare-metal, the firmware enables pmem support only on P9 and above. There the kernel depends on OPAL firmware to restrict exposing persistent memory related device tree entries on older hardware. of_pmem.ko is written without any arch dependency and we don't want to add ppc64 specific cpu feature check in of_pmem driver. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-2-aneesh.kumar@linux.ibm.com
-
Nicholas Piggin authored
When platform doesn't support GTSE, let TLB invalidation requests for radix guests be off-loaded to the host using H_RPT_INVALIDATE hcall. [hcall wrapper, error path handling and renames] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-4-bharata@linux.ibm.com
-
Bharata B Rao authored
H_REGISTER_PROC_TBL asks for GTSE by default. GTSE flag bit should be set only when GTSE is supported. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-3-bharata@linux.ibm.com
-
Bharata B Rao authored
Make GTSE an MMU feature and enable it by default for radix. However for guest, conditionally enable it if hypervisor supports it via OV5 vector. Let prom_init ask for radix GTSE only if the support exists. Having GTSE as an MMU feature will make it easy to enable radix without GTSE. Currently radix assumes GTSE is enabled by default. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-2-bharata@linux.ibm.com
-
- 15 Jul, 2020 16 commits
-
-
Christophe Leroy authored
On the same way as already done on PPC32, drop __get_datapage() function and use get_datapage inline macro instead. See commit ec0895f0 ("powerpc/vdso32: inline __get_datapage()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e13d95312e0b9792556b19b4bb8955cc1ff19fc7.1588079622.git.christophe.leroy@c-s.fr
-
Christophe Leroy authored
Instead of doing a __get_user() from the first and last location into a tmp var which won't be used, use fault_in_pages_readable() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/810bd8840ef990a200f58c9dea9abe767ca02a3a.1594146723.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
save_general_regs() which does special handling when i == PT_SOFTE. Rewrite it to minimise the specific part, especially the __put_user() and associated error handling is the same so make it common. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Use a regular if rather than ternary operator] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/47a38df46cae5a5a88a558a64d71f75e9c4d9950.1594125164.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Since commit ("1bd79336 powerpc: Fix various syscall/signal/swapcontext bugs"), getting save_general_regs() called without FULL_REGS() is very unlikely and generates a warning. The 32-bit version of save_general_regs() doesn't take care of it at all and copies all registers anyway since that commit. Moreover, commit 965dd3ad ("powerpc/64/syscall: Remove non-volatile GPR save optimisation") is another reason why it would never happen. So the same with 64-bit, don't worry about FULL_REGS() and copy all registers all the time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/173de3b659fa3a5f126a0eb170522cccd909950f.1594125164.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Doing kasan pages allocation in MMU_init is too early, kernel doesn't have access yet to the entire memory space and memblock_alloc() fails when the kernel is a bit big. Do it from kasan_init() instead. Fixes: 2edb16ef ("powerpc/32: Add KASAN support") Fixes: d2a91cef ("powerpc/kasan: Fix shadow pages allocation failure") Cc: stable@vger.kernel.org Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181 Link: https://lore.kernel.org/r/63048fcea8a1c02f75429ba3152f80f7853f87fc.1593690707.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
This reverts commit d2a91cef. This commit moved too much work in kasan_init(). The allocation of shadow pages has to be moved for the reason explained in that patch, but the allocation of page tables still need to be done before switching to the final hash table. First revert the incorrect commit, following patch redoes it properly. Fixes: d2a91cef ("powerpc/kasan: Fix shadow pages allocation failure") Cc: stable@vger.kernel.org Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181 Link: https://lore.kernel.org/r/3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Documentation wrongly tells that book3s/32 CPU have hash MMU. 603 and e300 core only have software loaded TLB. 755, 7450 family and e600 core have both hash MMU and software loaded TLB. This can be selected by setting a bit in HID2 (755) or HID0 (others). At the time being this is not supported by the kernel. Make this explicit in the documentation. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/261923c075d1cb49d02493685e8585d4ea2a5197.1593698951.git.christophe.leroy@csgroup.eu
-
Nicholas Piggin authored
Add a testcase that tries to trigger the FPU denormal exception on Power8 or earlier CPUs. Prior to commit 4557ac6b ("powerpc/64s/exception: Fix 0x1500 interrupt handler crash") this would trigger a crash such as: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ip6table_filter ip6_tables iptable_filter fuse kvm_hv binfmt_misc squashfs mlx4_ib ib_uverbs dm_multipath scsi_dh_rdac scsi_dh_alua ib_core mlx4_en sr_mod cdrom bnx2x lpfc mlx4_core crc_t10dif scsi_transport_fc sg mdio vmx_crypto crct10dif_vpmsum leds_powernv powernv_rng rng_core led_class powernv_op_panel sunrpc ip_tables x_tables autofs4 CPU: 159 PID: 6854 Comm: fpu_denormal Not tainted 5.8.0-rc2-gcc-8.2.0-00092-g4ec7aaab0828 #192 NIP: c0000000000100ec LR: c00000000001b85c CTR: 0000000000000000 REGS: c000001dd818f770 TRAP: 1500 Not tainted (5.8.0-rc2-gcc-8.2.0-00092-g4ec7aaab0828) MSR: 900000000290b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002884 XER: 20000000 CFAR: c00000000001005c IRQMASK: 1 GPR00: c00000000001c4c8 c000001dd818fa00 c00000000171c200 c000001dd8101570 GPR04: 0000000000000000 c000001dd818fe90 c000001dd8101590 000000000000001d GPR08: 0000000000000010 0000000000002000 c000001dd818fe90 fffffffffc48ac60 GPR12: 0000000000002200 c000001ffff4f480 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 00007fffab225b40 0000000000000001 c000000001757168 GPR24: c000001dd8101570 c0000018027b00f0 c000001dd8101570 c000000001496098 GPR28: c00000000174ad05 c000001dd8100000 c000001dd8100000 c000001dd8100000 NIP save_fpu+0xa8/0x2ac LR __giveup_fpu+0x2c/0xd0 Call Trace: 0xc000001dd818fa80 (unreliable) giveup_all+0x118/0x120 __switch_to+0x124/0x6c0 __schedule+0x390/0xaf0 do_task_dead+0x70/0x80 do_exit+0x8fc/0xe10 do_group_exit+0x64/0xd0 sys_exit_group+0x24/0x30 system_call_exception+0x164/0x270 system_call_common+0xf0/0x278 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of fix patch, add oops log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200708074942.1713396-1-npiggin@gmail.com
-
Nicholas Piggin authored
Returning from an interrupt or syscall to a signal handler currently begins execution directly at the handler's entry point, with LR set to the address of the sigreturn trampoline. When the signal handler function returns, it runs the trampoline. It looks like this: # interrupt at user address xyz # kernel stuff... signal is raised rfid # void handler(int sig) addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l mflr 0 std 0,16(1) stdu 1,-96(1) # handler stuff ld 0,16(1) mtlr 0 blr # __kernel_sigtramp_rt64 addi r1,r1,__SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc # kernel executes rt_sigreturn rfid # back to user address xyz Note the blr with no matching bl. This can corrupt the return predictor. Solve this by instead resuming execution at the signal trampoline which then calls the signal handler. qtrace-tools link_stack checker confirms the entire user/kernel/vdso cycle is balanced after this patch, whereas it's not upstream. Alan confirms the dwarf unwind info still looks good. gdb still recognises the signal frame and can step into parent frames if it break inside a signal handler. Performance is pretty noisy, not a very significant change on a POWER9 here, but branch misses are consistently a lot lower on a microbenchmark: Performance counter stats for './signal': 13,085.72 msec task-clock # 1.000 CPUs utilized 45,024,760,101 cycles # 3.441 GHz 65,102,895,542 instructions # 1.45 insn per cycle 11,271,673,787 branches # 861.372 M/sec 59,468,979 branch-misses # 0.53% of all branches 12,989.09 msec task-clock # 1.000 CPUs utilized 44,692,719,559 cycles # 3.441 GHz 65,109,984,964 instructions # 1.46 insn per cycle 11,282,136,057 branches # 868.585 M/sec 39,786,942 branch-misses # 0.35% of all branches Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
-
Arnd Bergmann authored
The kernel test robot pointed out a slightly different error message after recent commit 5456ffde ("powerpc/spufs: simplify spufs core dumping") to spufs for a configuration that never worked: powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_proxydma_info_dump': >> file.c:(.text+0x4c68): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_dma_info_dump': file.c:(.text+0x4d70): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_wbox_info_dump': file.c:(.text+0x4df4): undefined reference to `.dump_emit' Add a Kconfig dependency to prevent this from happening again. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200706132302.3885935-1-arnd@arndb.de
-
Oliver O'Halloran authored
pnv_ioda_setup_bus_dma() is only used when a passed through PE is returned to the host. If the kernel is built without IOMMU support this is dead code. Move it under the #ifdef with the rest of the IOMMU API support. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200705133557.443607-2-oohall@gmail.com
-
Oliver O'Halloran authored
The kernel test robot noticed these are non-static which causes Clang to print some warnings. These are called via ppc_md function pointers so there's no need for them to be non-static. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200705133557.443607-1-oohall@gmail.com
-
Abhishek Goel authored
Commit 1961acad removes usage of function "validate_dt_prop_sizes". This patch removes this unused function. Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200706053258.121475-1-huntbag@linux.vnet.ibm.com
-
Srikar Dronamraju authored
Unlike drivers/base/cacheinfo, powerpc cacheinfo code is not exposing shared_cpu_list under /sys/devices/system/cpu/cpu<n>/cache/index<m> Add shared_cpu_list to per cpu per index directory to maintain parity with x86. Some scripts (example: mmtests https://github.com/gormanm/mmtests) seem to be looking for shared_cpu_list instead of shared_cpu_map. Before this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets size ways_of_associativity level shared_cpu_map type # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # After this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets shared_cpu_map type level shared_cpu_list size ways_of_associativity # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_list 0-7 # Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-4-srikar@linux.vnet.ibm.com
-
Srikar Dronamraju authored
In anticipation of implementing shared_cpu_list, move code under shared_cpu_map_show() to a common function. No functional changes. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-3-srikar@linux.vnet.ibm.com
-
Srikar Dronamraju authored
Tejun Heo had modified shared_cpu_map_show() to use scnprintf instead of cpumap_print during support for *pb[l] format. Refer commit 0c118b7b ("powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasks"). cpumap_print_to_pagebuf() is a standard function to print cpumap. With commit 9cf79d11 ("bitmap: remove explicit newline handling using scnprintf format string"), there is no need to print explicit newline and trailing null character. cpumap_print_to_pagebuf() internally uses scnprintf(). Hence replace scnprintf() with cpumap_print_to_pagebuf(). Note: shared_cpu_map_show() in drivers/base/cacheinfo.c already uses cpumap_print_to_pagebuf(). Before this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # (Notice the extra blank line). After this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-2-srikar@linux.vnet.ibm.com
-