- 18 Oct, 2018 17 commits
-
-
Michael Ellerman authored
This is a nice cleanup, arch/powerpc/Makefile is long and messy so moving this out helps a little. It also allows us to do: $ make arch/powerpc Which can be helpful if you just want to compile test some changes to arch code and not link everything. Finally it also gives us a single place to do subdir-cc-flags assignments which affect the whole of arch/powerpc, which we will do in a future patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
There's some antiquated debug output that's trying to do a hand-made hexdump and turning into horrible 1-byte-per-line output these days. Use print_hex_dump() instead Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
do_exit() already includes a test to panic() is in_interrupt() This patch removes powerpc one which is redundant. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
When creating the boot-time FDT from an actual Open Firmware live tree, let's generate "phandle" properties for the phandles instead of the old deprecated "linux,phandle". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Unsplit warning printf()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
prom_init.c must not modify the kernel image outside of the .bss.prominit section. Thus make sure that prom_init.o doesn't have anything in any of these: .data .bss .init.data Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
This makes __prombss its own section, and for now store it in .bss. This will give us the ability later to store it elsewhere and/or free it after boot (it's about 8KB). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
As they are no longer used past the end of prom_init Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Make the existing initialized definition constant and copy it to a __prombss copy Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Initialize it dynamically instead of statically Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We removed support for running under any OPAL version earlier than v3 in 2015 (they never saw the light of day anyway), but we kept some leftovers of this support in prom_init.c, so let's take it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
This replaces all occurrences of __initdata for uninitialized data with a new __prombss Currently __promdata is defined to be __initdata but we'll eventually change that. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
Adds a driver that implements support for enabling and accessing PAPR SCM regions. Unfortunately due to how the PAPR interface works we can't use the existing of_pmem driver (yet) because: a) The guest is required to use the H_SCM_BIND_MEM h-call to add add the SCM region to it's physical address space, and b) There is currently no mechanism for relating a bare of_pmem region to the backing DIMM (or not-a-DIMM for our case). Both of these are easily handled by rolling the functionality into a seperate driver so here we are... Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Oliver O'Halloran authored
This patch implements support for discovering storage class memory devices at boot and for handling hotplug of new regions via RTAS hotplug events. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Fix CONFIG_MEMORY_HOTPLUG=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
When printing the machine check cause, the cause appears on the following line due to bad use of printk without \n: [ 33.663993] Machine check in kernel mode. [ 33.664011] Caused by (from SRR1=9032): [ 33.664036] Data access error at address c90c8000 This patch fixes it by using pr_cont() for the second part: [ 133.258131] Machine check in kernel mode. [ 133.258146] Caused by (from SRR1=9032): Data access error at address c90c8000 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash default pte_mkprivileged() and pte_mkuser() are not usable. This patch redefines them for book3e. In theorie, only pte_mkprivileged() needs to be redefined because _PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing to redefine both. Fixes: a0da4bc1 ("powerpc/mm: Allow platforms to redefine some helpers") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Other archs do the same and instead of adding required pte bits (which got masked out) in __ioremap_at(), make sure we filter only pfn bits out. Fixes: 26973fa5 ("powerpc/mm: use pte helpers in generic code") Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 14 Oct, 2018 23 commits
-
-
Aneesh Kumar K.V authored
Currently we limit the max addressable memory to 128TB. This patch increase the limit to 2PB. We can have devices like nvdimm which adds memory above 512TB limit. We still don't support regular system ram above 512TB. One of the challenge with that is the percpu allocator, that allocates per node memory and use the max distance between them as the percpu offsets. This means with large gap in address space ( system ram above 1PB) we will run out of vmalloc space to map the percpu allocation. In order to support addressable memory above 512TB, kernel should be able to linear map this range. To do that with hash translation we now add 4 context to kernel linear map region. Our per context addressable range is 512TB. We still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated to validate these limit. We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We will be adding get_kernel_context later. Update function name to indicate this handle context allocation user space address. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This adds CONFIG_DEBUG_VM checks to ensure: - The kernel stack is in the SLB after it's flushed and bolted. - We don't insert an SLB for an address that is aleady in the SLB. - The kernel SLB miss handler does not take an SLB miss. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
slb_flush_and_rebolt() is misleading, it is called in virtual mode, so it can not possibly change the stack, so it should not be touching the shadow area. And since vmalloc is no longer bolted, it should not change any bolted mappings at all. Change the name to slb_flush_and_restore_bolted(), and have it just load the kernel stack from what's currently in the shadow SLB area. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory must not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Disallow tracing for all of slb.c for now.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
PPR is the odd register out when it comes to interrupt handling, it is saved in current->thread.ppr while all others are saved on the stack. The difficulty with this is that accessing thread.ppr can cause a SLB fault, but the SLB fault handler implementation in C change had assumed the normal exception entry handlers would not cause an SLB fault. Fix this by allocating room in the interrupt stack to save PPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Now that we've split the user & kernel versions of pt_regs we need to be more careful in the ptrace code. For now we've ensured the location of the fields in both structs is the same, so most of the ptrace code doesn't need updating. But there are a few places where we use sizeof(pt_regs), and these will be wrong as soon as we increase the size of the kernel structure. So flip them all to use sizeof(user_pt_regs). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
We use a shared definition for struct pt_regs in uapi/asm/ptrace.h. That means the layout of the structure is ABI, ie. we can't change it. That would be fine if it was only used to describe the user-visible register state of a process, but it's also the struct we use in the kernel to describe the registers saved in an interrupt frame. We'd like more flexibility in the content (and possibly layout) of the kernel version of the struct, but currently that's not possible. So split the definition into a user-visible definition which remains unchanged, and a kernel internal one. At the moment they're still identical, and we check that at build time. That's because we have code (in ptrace etc.) that assumes that they are the same. We will fix that code in future patches, and then we can break the strict symmetry between the two structs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It's never modified. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It is never modified Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It's not used anywhere else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
In the same spirit as already done in pte query helpers, this patch changes pte setting helpers to perform endian conversions on the constants rather than on the pte value. In the meantime, it changes pte_access_permitted() to use pte helpers for the same reason. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
_PAGE_PRIVILEGED corresponds to the SH bit which doesn't protect against user access but only disables ASID verification on kernel accesses. User access is controlled with _PMD_USER flag. Name it _PAGE_SH instead of _PAGE_PRIVILEGED _PAGE_HUGE corresponds to the SPS bit which doesn't really tells that's it is a huge page but only that it is not a 4k page. Name it _PAGE_SPS instead of _PAGE_HUGE Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Do not include pte-common.h in nohash/32/pgtable.h As that was the last includer, get rid of pte-common.h Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Cache related flags like _PAGE_COHERENT and _PAGE_WRITETHRU are defined on most platforms. The platforms not defining them don't define any alternative. So we can give them a NUL value directly for those platforms directly. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The 40xx defines _PAGE_HWWRITE while others don't. The 8xx defines _PAGE_RO instead of _PAGE_RW. The 8xx defines _PAGE_PRIVILEGED instead of _PAGE_USER. The 8xx defines _PAGE_HUGE and _PAGE_NA while others don't. Lets those platforms redefine pte_write(), pte_wrprotect() and pte_mkwrite() and get _PAGE_RO and _PAGE_HWWRITE off the common helpers. Lets the 8xx redefine pte_user(), pte_mkprivileged() and pte_mkuser() and get rid of _PAGE_PRIVILEGED and _PAGE_USER default values. Lets the 8xx redefine pte_mkhuge() and get rid of _PAGE_HUGE default value. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
nohash/64 only uses book3e PTE flags, so it doesn't need pte-common.h This also allows to drop PAGE_SAO and H_PAGE_4K_PFN from pte_common.h as they are only used by PPC64 Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
The base kernel PAGE_XXXX definition sets are more or less platform specific. Lets distribute them close to platform _PAGE_XXX flags definition, and customise them to their exact platform flags. Also defines _PAGE_PSIZE and _PTE_NONE_MASK for each platform allthough they are defined as 0. Do the same with _PMD flags like _PMD_USER and _PMD_PRESENT_MASK Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Now the pte-common.h is only for nohash platforms, lets move pte_user() helper out of pte-common.h to put it together with other helpers. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
As done for book3s/64, add necessary flags/defines in book3s/32/pgtable.h and do not include pte-common.h It allows in the meantime to remove all related hash definitions from pte-common.h and to also remove _PAGE_EXEC default as _PAGE_EXEC is defined on all platforms except book3s/32. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-