- 14 Apr, 2021 6 commits
-
-
Li Huafei authored
The sparse tool complains as follows: arch/powerpc/kernel/security.c:253:6: warning: symbol 'stf_barrier' was not declared. Should it be static? This symbol is not used outside of security.c, so this commit marks it static. Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210408033951.28369-1-lihuafei1@huawei.com
-
Christophe Leroy authored
On book3s/32, the segment below kernel text is used for module allocation when CONFIG_STRICT_KERNEL_RWX is defined. In order to benefit from the powerpc specific module_alloc() function which allocate modules with 32 Mbytes from end of kernel text, use that segment below PAGE_OFFSET at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a46dcdd39a9e80b012d86c294c4e5cd8d31665f3.1617283827.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
On the 8xx, TASK_SIZE is 0x80000000. The space between TASK_SIZE and PAGE_OFFSET is not used. In order to benefit from the powerpc specific module_alloc() function which allocate modules with 32 Mbytes from end of kernel text, define MODULES_VADDR and MODULES_END. Set a 256Mb area just below PAGE_OFFSET, like book3s/32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a225606d5b3a8bc53fe612ad52c855c60b0a0a58.1617283827.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
On book3s/32, when STRICT_KERNEL_RWX is selected, modules are allocated on the segment just before kernel text, ie on the 0xb0000000-0xbfffffff when PAGE_OFFSET is 0xc0000000. On the 8xx, TASK_SIZE is 0x80000000. The space between TASK_SIZE and PAGE_OFFSET is not used and could be used for modules. The idea comes from ARM architecture. Having modules just below PAGE_OFFSET offers an opportunity to minimise the distance between kernel text and modules and avoid trampolines in modules to access kernel functions or other module functions. When MODULES_VADDR is defined, powerpc has it's own module_alloc() function. In that function, first try to allocate the module above the limit defined by '_etext - 32M'. Then if the allocation fails, fallback to the entire MODULES area. DEBUG logs in module_32.c without the patch: [ 1572.588822] module_32: Applying ADD relocate section 13 to 12 [ 1572.588891] module_32: Doing plt for call to 0xc00671a4 at 0xcae04024 [ 1572.588964] module_32: Initialized plt for 0xc00671a4 at cae04000 [ 1572.589037] module_32: REL24 value = CAE04000. location = CAE04024 [ 1572.589110] module_32: Location before: 48000001. [ 1572.589171] module_32: Location after: 4BFFFFDD. [ 1572.589231] module_32: ie. jump to 03FFFFDC+CAE04024 = CEE04000 [ 1572.589317] module_32: Applying ADD relocate section 15 to 14 [ 1572.589386] module_32: Doing plt for call to 0xc00671a4 at 0xcadfc018 [ 1572.589457] module_32: Initialized plt for 0xc00671a4 at cadfc000 [ 1572.589529] module_32: REL24 value = CADFC000. location = CADFC018 [ 1572.589601] module_32: Location before: 48000000. [ 1572.589661] module_32: Location after: 4BFFFFE8. [ 1572.589723] module_32: ie. jump to 03FFFFE8+CADFC018 = CEDFC000 With the patch: [ 279.404671] module_32: Applying ADD relocate section 13 to 12 [ 279.404741] module_32: REL24 value = C00671B4. location = BF808024 [ 279.404814] module_32: Location before: 48000001. [ 279.404874] module_32: Location after: 4885F191. [ 279.404933] module_32: ie. jump to 0085F190+BF808024 = C00671B4 [ 279.405016] module_32: Applying ADD relocate section 15 to 14 [ 279.405085] module_32: REL24 value = C00671B4. location = BF800018 [ 279.405156] module_32: Location before: 48000000. [ 279.405215] module_32: Location after: 4886719C. [ 279.405275] module_32: ie. jump to 0086719C+BF800018 = C00671B4 We see that with the patch, no plt entries are set. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0c3d5cb8a4dfdf6ca1b8aeb385c01470d6628d55.1617283827.git.christophe.leroy@csgroup.eu
-
Vaibhav Jain authored
While removing large number of mappings from hash page tables for large memory systems as soft-lockup is reported because of the time spent inside htap_remove_mapping() like one below: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! <snip> NIP plpar_hcall+0x38/0x58 LR pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: 0x1fffffffffff000 (unreliable) pSeries_lpar_hpte_removebolted+0x9c/0x230 hash__remove_section_mapping+0xec/0x1c0 remove_section_mapping+0x28/0x3c arch_remove_memory+0xfc/0x150 devm_memremap_pages_release+0x180/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd4/0x270 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Fix this by adding a cond_resched() to the loop in htap_remove_mapping() that issues hcall to remove hpte mapping. The call to cond_resched() is issued every HZ jiffies which should prevent the soft-lockup from being reported. Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com
-
Shivaprasad G Bhat authored
Add support for ND_REGION_ASYNC capability if the device tree indicates 'ibm,hcall-flush-required' property in the NVDIMM node. Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor. If the flush request failed, the hypervisor is expected to to reflect the problem in the subsequent nvdimm H_SCM_HEALTH call. This patch prevents mmap of namespaces with MAP_SYNC flag if the nvdimm requires an explicit flush[1]. References: [1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.cSigned-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Use unsigned long / long instead of uint64_t/int64_t] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161703936121.36.7260632399582101498.stgit@e1fbed493c87
-
- 12 Apr, 2021 1 commit
-
-
Christophe Leroy authored
Add missing fault exit label in unsafe_copy_from_user() in order to avoid following build failure with CONFIG_SPE CC arch/powerpc/kernel/signal_32.o arch/powerpc/kernel/signal_32.c: In function 'restore_user_regs': arch/powerpc/kernel/signal_32.c:565:36: error: macro "unsafe_copy_from_user" requires 4 arguments, but only 3 given 565 | ELF_NEVRREG * sizeof(u32)); | ^ In file included from ./include/linux/uaccess.h:11, from ./include/linux/sched/task.h:11, from ./include/linux/sched/signal.h:9, from ./include/linux/rcuwait.h:6, from ./include/linux/percpu-rwsem.h:7, from ./include/linux/fs.h:33, from ./include/linux/huge_mm.h:8, from ./include/linux/mm.h:707, from arch/powerpc/kernel/signal_32.c:17: ./arch/powerpc/include/asm/uaccess.h:428: note: macro "unsafe_copy_from_user" defined here 428 | #define unsafe_copy_from_user(d, s, l, e) \ | arch/powerpc/kernel/signal_32.c:564:3: error: 'unsafe_copy_from_user' undeclared (first use in this function); did you mean 'raw_copy_from_user'? 564 | unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs, | ^~~~~~~~~~~~~~~~~~~~~ | raw_copy_from_user arch/powerpc/kernel/signal_32.c:564:3: note: each undeclared identifier is reported only once for each function it appears in make[3]: *** [arch/powerpc/kernel/signal_32.o] Error 1 Fixes: 627b72be ("powerpc/signal32: Convert restore_[tm]_user_regs() to user access block") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aad2cb1801a3cc99bc27081022925b9fc18a0dfb.1618159169.git.christophe.leroy@csgroup.eu
-
- 08 Apr, 2021 18 commits
-
-
Christophe Leroy authored
For unknown reason, old commit d27dfd388715 ("Import pre2.0.8") changed 'ptrdiff_t' from 'int' to 'long'. GCC expects it as 'int' really, and this leads to the following warning when building KFENCE: CC mm/kfence/report.o In file included from ./include/linux/printk.h:7, from ./include/linux/kernel.h:16, from mm/kfence/report.c:10: mm/kfence/report.c: In function 'kfence_report_error': ./include/linux/kern_levels.h:5:18: warning: format '%td' expects argument of type 'ptrdiff_t', but argument 6 has type 'long int' [-Wformat=] 5 | #define KERN_SOH "\001" /* ASCII Start Of Header */ | ^~~~~~ ./include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH' 11 | #define KERN_ERR KERN_SOH "3" /* error conditions */ | ^~~~~~~~ ./include/linux/printk.h:343:9: note: in expansion of macro 'KERN_ERR' 343 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~ mm/kfence/report.c:213:3: note: in expansion of macro 'pr_err' 213 | pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n", | ^~~~~~ <asm-generic/uapi/posix-types.h> defines it as 'int', and defines 'size_t' and 'ssize_t' exactly as powerpc do, so remove the powerpc specific definitions and fallback on generic ones. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e43d133bf52fa19e577f64f3a3a38cedc570377d.1617616601.git.christophe.leroy@csgroup.eu
-
Randy Dunlap authored
When neither CONFIG_PCI nor CONFIG_IBMVIO is set/enabled, iommu.c has a build error. The fault injection code is not useful in that kernel config, so make the FAIL_IOMMU option depend on PCI || IBMVIO. Prevents this build error (warning escalated to error): ../arch/powerpc/kernel/iommu.c:178:30: error: 'fail_iommu_bus_notifier' defined but not used [-Werror=unused-variable] 178 | static struct notifier_block fail_iommu_bus_notifier = { Fixes: d6b9a81b ("powerpc: IOMMU fault injection") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210404192623.10697-1-rdunlap@infradead.org
-
Yang Li authored
Eliminate the following coccicheck warning: ./arch/powerpc/platforms/pseries/lpar.c:1633:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1617672785-81372-1-git-send-email-yang.lee@linux.alibaba.com
-
Nicholas Piggin authored
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210406025508.821718-1-npiggin@gmail.com
-
Athira Rajeev authored
The power PMU group constraints includes check for EBB events to make sure all events in a group must agree on EBB. This will prevent scheduling EBB and non-EBB events together. But in the existing check, settings for constraint mask and value is interchanged. Patch fixes the same. Before the patch, PMU selftest "cpu_event_pinned_vs_ebb_test" fails with below in dmesg logs. This happens because EBB event gets enabled along with a non-EBB cpu event. [35600.453346] cpu_event_pinne[41326]: illegal instruction (4) at 10004a18 nip 10004a18 lr 100049f8 code 1 in cpu_event_pinned_vs_ebb_test[10000000+10000] Test results after the patch: $ ./pmu/ebb/cpu_event_pinned_vs_ebb_test test: cpu_event_pinned_vs_ebb tags: git_version:v5.12-rc5-93-gf28c3125acd3-dirty Binding to cpu 8 EBB Handler is at 0x100050c8 read error on event 0x7fffe6bd4040! PM_RUN_INST_CMPL: result 9872 running/enabled 37930432 success: cpu_event_pinned_vs_ebb This bug was hidden by other logic until commit 1908dc91 (perf: Tweak perf_event_attr::exclusive semantics). Fixes: 4df48999 ("powerpc/perf: Add power8 EBB support") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Mention commit 1908dc91] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1617725761-1464-1-git-send-email-atrajeev@linux.vnet.ibm.com
-
Jordan Niethe authored
The suggested alternative for getting cache-inhibited memory with 'mem=' and /dev/mem is pretty hacky. Also, PAPR guests do not allow system memory to be mapped cache-inhibited so despite /dev/mem being available this will not work which can cause confusion. Instead recommend using the memtrace buffers. memtrace is only available on powernv so there will not be any chance of trying to do this in a guest. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210225032108.1458352-2-jniethe5@gmail.com
-
Jordan Niethe authored
Let the memory removed from the linear mapping to be used for the trace buffers be mmaped. This is a useful way of providing cache-inhibited memory for the alignment_handler selftest. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: make memtrace_mmap() static as noticed by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210225032108.1458352-1-jniethe5@gmail.com
-
Michael Ellerman authored
As best as I can tell the ".machine" directive in trampoline_64.S is no longer, or never was, necessary. It was added in commit 0d976313 ("powerpc: Add purgatory for kexec_file_load() implementation."), which created the file based on the kexec-tools purgatory. It may be/have-been necessary in the kexec-tools version, but we have a completely different build system, and we already pass the desired CPU flags, eg: gcc ... -m64 -Wl,-a64 -mabi=elfv2 -Wa,-maltivec -Wa,-mpower4 -Wa,-many ... arch/powerpc/purgatory/trampoline_64.S So drop the ".machine" directive and rely on the assembler flags. Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/r/20210315034159.315675-1-mpe@ellerman.id.au
-
Michael Ellerman authored
When the original spectre/meltdown mitigations were merged we put them in setup_64.c for lack of a better place. Since then we created security.c for some of the other mitigation related code. But it should all be in there. This sort of code movement can cause trouble for backports, but hopefully this code is relatively stable these days (famous last words). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210326101201.1973552-1-mpe@ellerman.id.au
-
Michael Ellerman authored
We have now fixed the known bugs in STRICT_KERNEL_RWX for Book3S 64-bit Hash and Radix MMUs, see preceding commits, so allow the option to be selected again. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-6-mpe@ellerman.id.au
-
Michael Ellerman authored
When we enabled STRICT_KERNEL_RWX we received some reports of boot failures when using the Hash MMU and running under phyp. The crashes are intermittent, and often exhibit as a completely unresponsive system, or possibly an oops. One example, which was caught in xmon: [ 14.068327][ T1] devtmpfs: mounted [ 14.069302][ T1] Freeing unused kernel memory: 5568K [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch [ 14.142063][ T1] Run /sbin/init as init process [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400 cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0] pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80 lr: c0000000001862d4: update_rq_clock+0x44/0x110 sp: c00000000c747880 msr: 8000000040001031 current = 0xc00000000c60d380 paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01 pid = 347, comm = kworker/2:1 ... enter ? for help [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable) [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0 [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0 [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390 [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610 [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480 [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950 [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120 [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640 [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0 [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c This shows that CPU 2, which was idle, woke up and then appears to randomly take an instruction fault on a completely valid area of kernel text. The cause turns out to be the call to hash__mark_rodata_ro(), late in boot. Due to the way we layout text and rodata, that function actually changes the permissions for all of text and rodata to read-only plus execute. To do the permission change we use a hypervisor call, H_PROTECT. On phyp that appears to be implemented by briefly removing the mapping of the kernel text, before putting it back with the updated permissions. If any other CPU is executing during that window, it will see spurious faults on the kernel text and/or data, leading to crashes. To fix it we use stop machine to collect all other CPUs, and then have them drop into real mode (MMU off), while we change the mapping. That way they are unaffected by the mapping temporarily disappearing. We don't see this bug on KVM because KVM always use VPM=1, where faults are directed to the hypervisor, and the fault will be serialised vs the h_protect() by HPTE_V_HVLOCK. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au
-
Michael Ellerman authored
Pull the loop calling hpte_updateboltedpp() out of hash__change_memory_range() into a helper function. We need it to be a separate function for the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au
-
Michael Ellerman authored
In hash__mark_rodata_ro() we pass the raw PP_RXXX value to hash__change_memory_range(). That has the effect of setting the key to zero, because PP_RXXX contains no key value. Fix it by using htab_convert_pte_flags(), which knows how to convert a pgprot into a pp value, including the key. Fixes: d94b827e ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au
-
Michael Ellerman authored
The flags argument to plpar_pte_protect() (aka. H_PROTECT), includes the key in bits 9-13, but currently we always set those bits to zero. In the past that hasn't been a problem because we always used key 0 for the kernel, and updateboltedpp() is only used for kernel mappings. However since commit d94b827e ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") we are now inadvertently changing the key (to zero) when we call plpar_pte_protect(). That hasn't broken anything because updateboltedpp() is only used for STRICT_KERNEL_RWX, which is currently disabled on 64s due to other bugs. But we want to fix that, so first we need to pass the key correctly to plpar_pte_protect(). We can't pass our newpp value directly in, we have to convert it into the form expected by the hcall. The hcall we're using here is H_PROTECT, which is specified in section 14.5.4.1.6 of LoPAPR v1.1. It takes a `flags` parameter, and the description for flags says: * flags: AVPN, pp0, pp1, pp2, key0-key4, n, and for the CMO option: CMO Option flags as defined in Table 189‚ If you then go to the start of the parent section, 14.5.4.1, on page 405, it says: Register Linkage (For hcall() tokens 0x04 - 0x18) * On Call * R3 function call token * R4 flags (see Table 178‚ “Page Frame Table Access flags field definition‚” on page 401) Then you have to go to section 14.5.3, and on page 394 there is a list of hcalls and their tokens (table 176), and there you can see that H_PROTECT == 0x18. Finally you can look at table 178, on page 401, where it specifies the layout of the bits for the key: Bit Function ----------------- 50-54 | key0-key4 Those are big-endian bit numbers, converting to normal bit numbers you get bits 9-13, or 0x3e00. In the kernel we have: #define HPTE_R_KEY_HI ASM_CONST(0x3000000000000000) #define HPTE_R_KEY_LO ASM_CONST(0x0000000000000e00) So the LO bits of newpp are already in the right place, and the HI bits need to be shifted down by 48. Fixes: d94b827e ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-2-mpe@ellerman.id.au
-
Michael Ellerman authored
In the past we had a fallback definition for _PAGE_KERNEL_ROX, but we removed that in commit d82fd29c ("powerpc/mm: Distribute platform specific PAGE and PMD flags and definitions") and added definitions for each MMU family. However we missed adding a definition for 64s, which was not really a bug because it's currently not used. But we'd like to use PAGE_KERNEL_ROX in a future patch so add a definition now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-1-mpe@ellerman.id.au
-
Jordan Niethe authored
Previously when mapping kernel memory on radix, no ptesync was included which would periodically lead to unhandled spurious faults. Mapping kernel memory is used when code patching with Strict RWX enabled. As suggested by Chris Riedl, turning ftrace on and off does a large amount of code patching so is a convenient way to see this kind of fault. Add a selftest to try and trigger this kind of a spurious fault. It tests for 30 seconds which is usually long enough for the issue to show up. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Rename it to better reflect what it does, rather than the symptom] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-2-jniethe5@gmail.com
-
Jordan Niethe authored
When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: f1cb8f9b ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
-
Bhaskar Chowdhury authored
Various spelling/typo fixes. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 03 Apr, 2021 15 commits
-
-
Christophe Leroy authored
Convert powerpc to relative jump labels. Before the patch, pseries_defconfig vmlinux.o has: 9074 __jump_table 0003f2a0 0000000000000000 0000000000000000 01321fa8 2**0 With the patch, the same config gets: 9074 __jump_table 0002a0e0 0000000000000000 0000000000000000 01321fb4 2**0 Size is 258720 without the patch, 172256 with the patch. That's a 33% size reduction. Largely copied from commit c296146c ("arm64/kernel: jump_label: Switch to relative references") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/828348da7868eda953ce023994404dfc49603b64.1616514473.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
When the BPF routine doesn't call any function, the non volatile registers can be reallocated to volatile registers in order to avoid having to save them/restore on the stack. Before this patch, the test #359 ADD default X is: 0: 7c 64 1b 78 mr r4,r3 4: 38 60 00 00 li r3,0 8: 94 21 ff b0 stwu r1,-80(r1) c: 60 00 00 00 nop 10: 92 e1 00 2c stw r23,44(r1) 14: 93 01 00 30 stw r24,48(r1) 18: 93 21 00 34 stw r25,52(r1) 1c: 93 41 00 38 stw r26,56(r1) 20: 39 80 00 00 li r12,0 24: 39 60 00 00 li r11,0 28: 3b 40 00 00 li r26,0 2c: 3b 20 00 00 li r25,0 30: 7c 98 23 78 mr r24,r4 34: 7c 77 1b 78 mr r23,r3 38: 39 80 00 42 li r12,66 3c: 39 60 00 00 li r11,0 40: 7d 8c d2 14 add r12,r12,r26 44: 39 60 00 00 li r11,0 48: 7d 83 63 78 mr r3,r12 4c: 82 e1 00 2c lwz r23,44(r1) 50: 83 01 00 30 lwz r24,48(r1) 54: 83 21 00 34 lwz r25,52(r1) 58: 83 41 00 38 lwz r26,56(r1) 5c: 38 21 00 50 addi r1,r1,80 60: 4e 80 00 20 blr After this patch, the same test has become: 0: 7c 64 1b 78 mr r4,r3 4: 38 60 00 00 li r3,0 8: 94 21 ff b0 stwu r1,-80(r1) c: 60 00 00 00 nop 10: 39 80 00 00 li r12,0 14: 39 60 00 00 li r11,0 18: 39 00 00 00 li r8,0 1c: 38 e0 00 00 li r7,0 20: 7c 86 23 78 mr r6,r4 24: 7c 65 1b 78 mr r5,r3 28: 39 80 00 42 li r12,66 2c: 39 60 00 00 li r11,0 30: 7d 8c 42 14 add r12,r12,r8 34: 39 60 00 00 li r11,0 38: 7d 83 63 78 mr r3,r12 3c: 38 21 00 50 addi r1,r1,80 40: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b94562d7d2bb21aec89de0c40bb3cd91054b65a2.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Implement Extended Berkeley Packet Filter on Powerpc 32 Test result with test_bpf module: test_bpf: Summary: 378 PASSED, 0 FAILED, [354/366 JIT'ed] Registers mapping: [BPF_REG_0] = r11-r12 /* function arguments */ [BPF_REG_1] = r3-r4 [BPF_REG_2] = r5-r6 [BPF_REG_3] = r7-r8 [BPF_REG_4] = r9-r10 [BPF_REG_5] = r21-r22 (Args 9 and 10 come in via the stack) /* non volatile registers */ [BPF_REG_6] = r23-r24 [BPF_REG_7] = r25-r26 [BPF_REG_8] = r27-r28 [BPF_REG_9] = r29-r30 /* frame pointer aka BPF_REG_10 */ [BPF_REG_FP] = r17-r18 /* eBPF jit internal registers */ [BPF_REG_AX] = r19-r20 [TMP_REG] = r31 As PPC32 doesn't have a redzone in the stack, a stack frame must always be set in order to host at least the tail count counter. The stack frame remains for tail calls, it is set by the first callee and freed by the last callee. r0 is used as temporary register as much as possible. It is referenced directly in the code in order to avoid misusing it, because some instructions interpret it as value 0 instead of register r0 (ex: addi, addis, stw, lwz, ...) The following operations are not implemented: case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */ case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */ case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */ The following operations are only implemented for power of two constants: case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */ case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/61d8b149176ddf99e7d5cef0b6dc1598583ca202.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
The following opcodes will be needed for the implementation of eBPF for PPC32. Add them in asm/ppc-opcode.h PPC_RAW_ADDE PPC_RAW_ADDZE PPC_RAW_ADDME PPC_RAW_MFLR PPC_RAW_ADDIC PPC_RAW_ADDIC_DOT PPC_RAW_SUBFC PPC_RAW_SUBFE PPC_RAW_SUBFIC PPC_RAW_SUBFZE PPC_RAW_ANDIS PPC_RAW_NOR Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f7bd573a368edd78006f8a5af508c726e7ce1ed2.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Because PPC32 will use more non volatile registers, move SEEN_ flags to positions 0-2 which corresponds to special registers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/608faa1dc3ecfead649e15392abd07b00313d2ba.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Move into bpf_jit_comp.c the functions that will remain common to PPC64 and PPC32 when we add support of EBPF for PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2c339d77fb168ef12b213ccddfee3cb6c8ce8ae1.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Move functions bpf_flush_icache(), bpf_is_seen_register() and bpf_set_seen_register() in order to reuse them in future bpf_jit_comp32.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/28e8d5a75e64807d7e9d39a4b52658755e259f8c.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Instead of using BPF register number as input in functions bpf_set_seen_register() and bpf_is_seen_register(), use CPU register number directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0cd2506f598e7095ea43e62dca1f472de5474a0d.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
At the time being, PPC32 has Classical BPF support. The test_bpf module exhibits some failure: test_bpf: #298 LD_IND byte frag jited:1 ret 202 != 66 FAIL (1 times) test_bpf: #299 LD_IND halfword frag jited:1 ret 51958 != 17220 FAIL (1 times) test_bpf: #301 LD_IND halfword mixed head/frag jited:1 ret 51958 != 1305 FAIL (1 times) test_bpf: #303 LD_ABS byte frag jited:1 ret 202 != 66 FAIL (1 times) test_bpf: #304 LD_ABS halfword frag jited:1 ret 51958 != 17220 FAIL (1 times) test_bpf: #306 LD_ABS halfword mixed head/frag jited:1 ret 51958 != 1305 FAIL (1 times) test_bpf: Summary: 371 PASSED, 7 FAILED, [119/366 JIT'ed] Fixing this is not worth the effort. Instead, remove support for classical BPF and prepare for adding Extended BPF support instead. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fbc3e4fcc9c8f6131d6c705212530b2aa50149ee.1616430991.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Same spirit as commit debf122c ("powerpc/signal32: Simplify logging in handle_rt_signal32()"), remove this intermediate 'addr' local var. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/638fa99530beb29f82f94370057d110e91272acc.1616151715.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Add unsafe_get_user_sigset() and transform PPC32 get_sigset_t() into an unsafe version unsafe_get_sigset_t(). Then convert do_setcontext() and do_setcontext_tm() to use user_read_access_begin/end. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9273ba664db769b8d9c7540ae91395e346e4945e.1616151715.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
Convert restore_user_regs() and restore_tm_user_regs() to use user_access_read_begin/end blocks. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/181adf15a6f644efcd1aeafb355f3578ff1b6bc5.1616151715.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
In restore_tm_user_regs(), regroup the reads from 'sr' and the ones from 'tm_sr' together in order to allow two block user accesses in following patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7c518b9a4c8e5ae9a3bfb647bc8b20bf820233af.1616151715.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
In preparation of using user_access_begin/end in restore_user_regs(), move the access_ok() inside the function. It makes no difference as the behaviour on a failed access_ok() is the same as on failed restore_user_regs(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c106eb2f37c3040f1fd38b40e50c670feb7cb835.1616151715.git.christophe.leroy@csgroup.eu
-
Christophe Leroy authored
In the same spirit as commit f1cf4f93 ("powerpc/signal32: Remove ifdefery in middle of if/else") MSR_TM_ACTIVE() is always defined and returns always 0 when CONFIG_PPC_TRANSACTIONAL_MEM is not selected, so the awful ifdefery in the middle of an if/else can be removed. Make 'msr_hi' a 'long long' to avoid build failure on PPC32 due to the 32 bits left shift. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a4b48b2f0be1ef13fc8e57452b7f8350da28d521.1616151715.git.christophe.leroy@csgroup.eu
-