- 13 Apr, 2017 2 commits
-
-
Christophe Lombard authored
The two previously fields pid and tid, located in the structure cxl_irq_info, are only used in the guest environment. To avoid confusion, it's not necessary to fill the fields in the bare-metal environment. Pid_tid is now renamed to 'reserved' to avoid undefined behavior on bare-metal. The PSL Process and Thread Identification Register (CXL_PSL_PID_TID_An) is only used when attaching a dedicated process for PSL8 only. This register goes away in CAIA2. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Lombard authored
This bit is used to cause a flash image load for programmable CAIA-compliant implementation. If this bit is set to ‘0’, a power cycle of the adapter is required to load a programmable CAIA-com- pliant implementation from flash. This field will be used by the following patches. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 12 Apr, 2017 6 commits
-
-
Rashmica Gupta authored
The current behaviour of the hash table dump assumes that memory is contiguous and iterates from the start of memory to (start + size of memory). When memory isn't physically contiguous, this doesn't work. If memory exists at 0-5 GB and 6-10 GB then the current approach will check if entries exist in the hash table from 0GB to 9GB. This patch changes the behaviour to iterate over any holes up to the end of memory. Fixes: 1515ab93 ("powerpc/mm: Dump hash table") Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
The current page table dumper scans the Linux page tables and coalesces mappings with adjacent virtual addresses and similar PTE flags. This behaviour is somewhat broken when you consider the IOREMAP space where entirely unrelated mappings will appear to be virtually contiguous. This patch modifies the range coalescing so that only ranges that are both physically and virtually contiguous are combined. This patch also adds to the dump output the physical address at the start of each range. Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Print the physicall address with 0x like the other addresses] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
On Book3s we have two PTE flags used to mark cache-inhibited mappings: _PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT. This patch modifies the dumper so both flags are shown in the dump. Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Balbir Singh authored
Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the syscall tracing machinery. This means users are not able to see the execution of mmap() syscalls using the syscall tracer. Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the meta-data associated with these syscalls is visible to the syscall tracer. A side-effect of this change is that the return type has changed from unsigned long to long. However this should have no effect, the only code in the kernel which uses the result of these syscalls is in the syscall return path, which is written in asm and treats the result as unsigned regardless. Example output: cat-3399 [001] .... 196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000) cat-3399 [001] .... 196.542443: sys_mmap -> 0x7fff922a0000 cat-3399 [001] .... 196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c) cat-3399 [001] .... 196.542677: sys_munmap -> 0x0 Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Massage change log, add detail on return type change] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"), we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong. The end result is that the PGD (Page Global Directory, ie top level page table) of the kernel (aka. swapper_pg_dir), is too small. This generally doesn't lead to a crash, as we don't use the full range in normal operation. However if we try to dump the kernel pagetables we can trigger a crash because we walk off the end of the pgd into other memory and eventually try to dereference something bogus: $ cat /sys/kernel/debug/kernel_pagetables Unable to handle kernel paging request for data at address 0xe8fece0000000000 Faulting instruction address: 0xc000000000072314 cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890] pc: c000000000072314: ptdump_show+0x164/0x430 lr: c000000000072550: ptdump_show+0x3a0/0x430 dar: e802cf0000000000 seq_read+0xf8/0x560 full_proxy_read+0x84/0xc0 __vfs_read+0x6c/0x1d0 vfs_read+0xbc/0x1b0 SyS_read+0x6c/0x110 system_call+0x38/0xfc The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move the calculation into asm-offsets.c where we can do it easily using max(). Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
This merges the arch part of the XIVE support, leaving the final commit with the KVM specific pieces dangling on the branch for Paul to merge via the kvm-ppc tree.
-
- 10 Apr, 2017 19 commits
-
-
Gautham R. Shenoy authored
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Gautham R. Shenoy authored
Currently during idle-init on power9, if we don't find suitable stop states in the device tree that can be used as the default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default stop state psscr to be used by power9_idle and deepest stop state which is used by CPU-Hotplug. However, if the platform firmware has not configured or enabled a stop state, the kernel should not make any assumptions and fallback to a default choice. If the kernel uses a stop state that is not configured by the platform firmware, it may lead to further failures which should be avoided. In this patch, we modify the init code to ensure that the kernel uses only the stop states exposed by the firmware through the device tree. When a suitable default stop state isn't found, we disable ppc_md.power_save for power9. Similarly, when a suitable deepest_stop_state is not found in the device tree exported by the firmware, fall back to the default busy-wait loop in the CPU-Hotplug code. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Gautham R. Shenoy authored
Currently, the powernv cpu-offline function assumes that platform idle states such as stop on POWER9, winkle/sleep/nap on POWER8 are always available. On POWER8, it picks nap as the default state if other deep idle states like sleep/winkle are not available and enabled in the platform. On POWER9, nap is not available and all idle states are managed by STOP instruction. The parameters to the idle state are passed through processor stop status control register (PSSCR). Hence as such executing STOP would take parameters from current PSSCR. We do not want to make any assumptions in kernel on what STOP states and PSSCR features are configured by the platform. Ideally platform will configure a good set of stop states that can be used in the kernel. We would like to start with a clean slate, if the platform choose to not configure any state or there is an error in platform firmware that lead to no stop states being configured or allowed to be requested. This patch adds a fallback method for CPU-Hotplug that is similar to snooze loop at idle where the threads are left to spin at low priority and hence reduce the cycles consumed. This is a safe fallback mechanism in the case when no stop state would be requested if the platform firmware did not configure them most likely due to an error condition. Requesting a stop state when the platform has not configured them or enabled them would lead to further error conditions which could be difficult to debug. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Gautham R. Shenoy authored
Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which transitions the CPU to the deepest available platform idle state to a new function named pnv_cpu_offline() in powernv/idle.c. The rationale behind this code movement is that the data required to determine the deepest available platform state resides in powernv/idle.c. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anshuman Khandual authored
Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB, 1GB, 16GB non default huge page sizes to be used with mmap() system call. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Reword the comment to emphasise that these are only needed to use the non-default huge page size, and updated the change log.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anshuman Khandual authored
Generic core VM already prints these information in the log buffer, hence there is no need for a second print. This just removes the second print from arch powerpc NUMA init path. Before the patch: $ dmesg | grep "Initmem" numa: Initmem setup node 0 [mem 0x00000000-0xffffffff] numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff] numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff] numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff] numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff] numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff] numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff] numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff] Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff] Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff] Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff] Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff] Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff] Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff] Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff] Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff] After the patch just the latter set is printed. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Make sparsemem the default on all 64-bit Book3S platforms. It already is for pseries and ps3, and we need to enable it for powernv because on POWER9 memory between chips is discontiguous. For the other platforms sparsemem should work fine, though it might add a small amount of overhead. We can always force FLATMEM in the defconfigs if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
setup_initial_memory_limit() is called from early_init_devtree(), which runs prior to feature patching. If the kernel is built with CONFIG_JUMP_LABEL=y and CONFIG_JUMP_LABEL_FEATURE_CHECKS=y then we will potentially get the wrong value. If we also have CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y we get a warning and backtrace: Warning! mmu_has_feature() used prior to jump label init! CPU: 0 PID: 0 Comm: swapper Not tainted 4.11.0-rc4-gccN-next-20170331-g6af2434c #1 Call Trace: [c000000000fc3d50] [c000000000a26c30] .dump_stack+0xa8/0xe8 (unreliable) [c000000000fc3de0] [c00000000002e6b8] .setup_initial_memory_limit+0xa4/0x104 [c000000000fc3e60] [c000000000d5c23c] .early_init_devtree+0xd0/0x2f8 [c000000000fc3f00] [c000000000d5d3b0] .early_setup+0x90/0x11c [c000000000fc3f90] [c000000000000520] start_here_multiplatform+0x68/0x80 Fix it by using early_mmu_has_feature(). Fixes: c12e6f24 ("powerpc: Add option to use jump label for mmu_has_feature()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
These files don't seem to have any need for asm/debug.h, now that all it includes are the debugger hooks and breakpoint definitions. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alistair Popple authored
In the recent commit 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") the NPU code gained a dependency on MMU notifiers. All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is not enabled then the build breaks. Fix it by always selecting MMU_NOTIFIER when we're building powernv. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
For a tlbiel with pid, we need to issue tlbiel with set number encoded. We don't need to do ptesync for each of those. Instead we need one for the entire tlbiel pid operation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
For fullmm tlb flush, we do a flush with RIC_FLUSH_ALL which will invalidate all related caches (radix__tlb_flush()). Hence the pwc flush is not needed. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We need to set LPES in order for normal external interrupts (0x500) to be directed to the guest while running in guest state. We also need HEIC set to prevent them to be sent to the host while in host state. With XIVE the host never gets one of these and wouldn't know how to handle it. All host external interrupts come in via the new hypervisor virtualization interrupts vector. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We have all sort of variants of MMIO accessors for the real mode instructions. This creates a clean set of accessors based on Linux normal naming conventions, replacing all occurrences of the old ones in the tree. I have purposefully removed the "out/in" variants in favor of only including __raw variants. Any code using these is already pretty much hand tuned to operate in a very specific environment. I've fixed up the 2 users (only one of them actually needed a barrier in the first place). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
The function doesn't exist anymore Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It's only used within the same file it's defined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We traditionally have linux/ before asm/ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 07 Apr, 2017 1 commit
-
-
Benjamin Herrenschmidt authored
Some powerpc platforms use this to move IRQs away from a CPU being unplugged. This function has several bugs such as not taking the right locks or failing to NULL check pointers. There's a new generic function doing exactly the same thing without all the bugs, so let's use it instead. mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will select HOTPLUG_CPU even though its dependencies are not met, which means the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to SMP. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 06 Apr, 2017 3 commits
-
-
Benjamin Herrenschmidt authored
Some platforms (will) need to perform allocations before bringing a new CPU online. Doing it from smp_ops->setup_cpu is the wrong thing to do: - It has no useful failure path (too late) - Calling any allocator will enable interrupts prematurely causing problems with large decrementer among others Instead, add a new callback that is called from __cpu_up (so from the context trying to online the new CPU) at a point where we can safely allocate and handle failures. This will be used by XIVE support. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Add 32 and 8 bit variants Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 04 Apr, 2017 3 commits
-
-
Matt Brown authored
New versions of OPAL have a device node /ibm,opal/firmware/exports, each property of which describes a range of memory in OPAL that Linux might want to export to userspace for debugging. This patch adds a sysfs file under 'opal/exports' for each property found there, and makes it read-only by root. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Drop counting of props, rename to attr, free on sysfs error, c'log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sukadev Bhattiprolu authored
When booting very large systems with a large initrd, we run out of space early in boot for either RTAS or the flattened device tree (FDT). Boot fails with messages like: Could not allocate memory for RTAS or No memory for flatten_device_tree (no room) Increasing the minimum RMA size to 512MB fixes the problem. This should not have an impact on smaller LPARs (with 256MB memory), as the firmware will cap the RMA to the memory assigned to the LPAR. Fix is based on input/discussions with Michael Ellerman. Thanks to Praveen K. Pandey for testing on a large system. Reported-by: Praveen K. Pandey <preveen.pandey@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alistair Popple authored
Nvlink2 supports address translation services (ATS) allowing devices to request address translations from an mmu known as the nest MMU which is setup to walk the CPU page tables. To access this functionality certain firmware calls are required to setup and manage hardware context tables in the nvlink processing unit (NPU). The NPU also manages forwarding of TLB invalidates (known as address translation shootdowns/ATSDs) to attached devices. This patch exports several methods to allow device drivers to register a process id (PASID/PID) in the hardware tables and to receive notification of when a device should stop issuing address translation requests (ATRs). It also adds a fault handler to allow device drivers to demand fault pages in. Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Fix up comment formatting, use flush_tlb_mm()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 03 Apr, 2017 6 commits
-
-
Alistair Popple authored
The pnv_pci_get_{gpu|npu}_dev functions are used to find associations between nvlink PCIe devices and standard PCIe devices. However they lacked basic sanity checking which results in NULL pointer dereferencing if they are incorrect called can be harder to spot than an explicit WARN_ON. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alistair Popple authored
There is of_property_read_u32_index but no u64 variant. This patch adds one similar to the u32 version for u64. Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
The code to fix the problem it describes was removed in commit c40785ad ("powerpc/dart: Use a cachable DART"), and it uses the stupid comment style. Away it goooooooooooooes! Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anton Blanchard authored
Early on in do_page_fault() we call store_updates_sp(), regardless of the type of exception. For an instruction miss this doesn't make sense, because we only use this information to detect if a data miss is the result of a stack expansion instruction or not. Worse still, it results in a data miss within every userspace instruction miss handler, because we try and load the very instruction we are about to install a pte for! A simple exec microbenchmark runs 6% faster on POWER8 with this fix: #include <stdlib.h> #include <stdio.h> #include <unistd.h> int main(int argc, char *argv[]) { unsigned long left = atol(argv[1]); char leftstr[16]; if (left-- == 0) return 0; sprintf(leftstr, "%ld", left); execlp(argv[0], argv[0], leftstr, NULL); perror("exec failed\n"); return 0; } Pass the number of iterations on the command line (eg 10000) and time how long it takes to execute. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
For an MCE (Machine Check Exception) that hits while in user mode MSR(PR=1), print the task info to the console MCE error log. This may help to identify an application that triggered the MCE. After this patch the MCE console looks like: Severe Machine check interrupt [Recovered] NIP: [0000000010039778] PID: 762 Comm: ebizzy Initiator: CPU Error type: SLB [Multihit] Effective address: 0000000010039778 Severe Machine check interrupt [Not recovered] NIP: [0000000010039778] PID: 763 Comm: ebizzy Initiator: CPU Error type: UE [Page table walk ifetch] Effective address: 0000000010039778 ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Mahesh Salgaonkar authored
For D-side errors we print the load/store address that caused the machine check as 'Effective address'. But the instruction that may have caused the machine check can also be helpful, so in addition to printing the NIP, also print the kernel function name as well. After this patch the MCE console log would look like: Severe Machine check interrupt [Recovered] NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel] Initiator: CPU Error type: SLB [Parity] Effective address: d000000026de0000 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-