Commits · cf4a6085151ae3f4e78dd91981833e65aaae8bc6 · Kirill Smelkov / linux

20 Oct, 2018 7 commits

powerpc/mm: Add missing tracepoint for tlbie · cf4a6085

Christophe Leroy authored Mar 21, 2018

commit 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
added tracepoints for tlbie calls, but _tlbil_va() was forgotten

Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cf4a6085

powerpc/book3s64: fix dump_linuxpagetables "present" flag · 3ff38e18

Christophe Leroy authored Oct 15, 2018

Since commit bd0dbb73 ("powerpc/mm/books3s: Add new pte bit to
mark pte temporarily invalid."), _PAGE_PRESENT doesn't mean exactly
that a page is present. A page is also considered preset when
_PAGE_INVALID is set.

This patch changes the meaning of "present" and adds a status "valid"
associated to the _PAGE_PRESENT flag.

Fixes: bd0dbb73 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3ff38e18

powerpc/pseries: Export raw per-CPU VPA data via debugfs · c6c26fb5

Aravinda Prasad authored Oct 16, 2018

This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

v3: Removed offline CPU check.

v2: Included offline CPU check and other review comments.
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c6c26fb5

selftests/powerpc: Add test to verify rfi flush across a system call · d2bf7932

Naveen N. Rao authored May 21, 2018

This adds a test to verify proper functioning of the rfi flush
capability implemented to mitigate meltdown. The test works by
measuring the number of L1d cache misses encountered while loading
data from memory. Across a system call, since the L1d cache is flushed
when rfi_flush is enabled, the number of cache misses is expected to
be relative to the number of cachelines corresponding to the data
being loaded.

The current system setting is reflected via powerpc/rfi_flush under
debugfs (assumed to be /sys/kernel/debug/). This test verifies the
expected result with rfi_flush enabled as well as when it is disabled.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add SPDX tags, clang format, skip if the debugfs is missing, use
 __u64 and SANE_USERSPACE_TYPES to avoid printf() build errors.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d2bf7932

selftests/powerpc: Move UCONTEXT_NIA() into utils.h · db384851

Naveen N. Rao authored May 21, 2018

... so that it can be used by others.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

db384851

powerpc64/module elfv1: Set opd addresses after module relocation · 59fe7eaf

Naveen N. Rao authored May 29, 2018

module_frob_arch_sections() is called before the module is moved to its
final location. The function descriptor section addresses we are setting
here are thus invalid. Fix this by processing opd section during
module_finalize()

Fixes: 5633e85b ("powerpc64: Add .opd based function descriptor dereference")
Cc: stable@vger.kernel.org # v4.16
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

59fe7eaf

powerpc: Add support for function error injection · 7cd01b08

Naveen N. Rao authored Jun 07, 2018

We implement regs_set_return_value() and override_function_with_return()
for this purpose.

On powerpc, a return from a function (blr) just branches to the location
contained in the link register. So, we can just update pt_regs rather
than redirecting execution to a dummy function that returns.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

7cd01b08

19 Oct, 2018 3 commits

powerpc/time: Fix clockevent_decrementer initalisation for PR KVM · b4d16ab5

Michael Ellerman authored Oct 17, 2018

In the recent commit 8b78fdb0 ("powerpc/time: Use
clockevents_register_device(), fixing an issue with large
decrementer") we changed the way we initialise the decrementer
clockevent(s).

We no longer initialise the mult & shift values of
decrementer_clockevent itself.

This has the effect of breaking PR KVM, because it uses those values
in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
mid-way through boot.

For now fix it by assigning back to decrementer_clockevent the mult
and shift values.

Fixes: 8b78fdb0 ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b4d16ab5

powerpc/aout: Fix struct user definition to use user_pt_regs · 6ce7bff0

Michael Ellerman authored Oct 15, 2018

I'm pretty sure this is dead code, it's only used by the a.out core
dump code, and we don't support a.out. We should remove it.

But while it's in the tree it should be using the ABI version of
pt_regs which is called user_pt_regs in the kernel, because the whole
struct is written to the core dump and so its size shouldn't change.

Note this isn't a uapi header so we don't need an ifdef.

Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

6ce7bff0

powerpc/uapi: Fix sigcontext definition to use user_pt_regs · 22a3d03d

Michael Ellerman authored Oct 15, 2018

My recent patch to split pt_regs between user and kernel missed
the usage in struct sigcontext.

Because this is a user visible struct it should be using the user
visible definition, which when we're building for the kernel is called
struct user_pt_regs.

As far as I can see this hasn't actually caused a bug (yet), because
we don't use the sizeof() the sigcontext->regs anywhere. But we should
still fix it to avoid confusion and future bugs.

Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
Reported-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

22a3d03d

18 Oct, 2018 19 commits

powerpc/io: remove old GCC version implementation · a0e10291

Christophe Leroy authored Oct 16, 2018

GCC 4.6 is the minimum supported now.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a0e10291

powerpc: Add -Werror at arch/powerpc level · 23ad1a27

Michael Ellerman authored Oct 10, 2018

Back when I added -Werror in commit ba55bd74 ("powerpc: Add
configurable -Werror for arch/powerpc") I did it by adding it to most
of the arch Makefiles.

At the time we excluded math-emu, because apparently it didn't build
cleanly. But that seems to have been fixed somewhere in the interim.

So move the -Werror addition to the top-level of the arch, this saves
us from repeating it in every Makefile and means we won't forget to
add it to any new sub-dirs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

23ad1a27

powerpc: Move core kernel logic into arch/powerpc/Kbuild · c47ca98d

Michael Ellerman authored Oct 10, 2018

This is a nice cleanup, arch/powerpc/Makefile is long and messy so
moving this out helps a little.

It also allows us to do:

  $ make arch/powerpc

Which can be helpful if you just want to compile test some changes to
arch code and not link everything.

Finally it also gives us a single place to do subdir-cc-flags
assignments which affect the whole of arch/powerpc, which we will do
in a future patch.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c47ca98d

macintosh/windfarm_smu_sat: Fix debug output · fc0c8b36

Benjamin Herrenschmidt authored Oct 15, 2018

There's some antiquated debug output that's trying
to do a hand-made hexdump and turning into horrible
1-byte-per-line output these days.

Use print_hex_dump() instead
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fc0c8b36

powerpc/traps: remove redundant in_interrupt panic in die() · bd03fd84

Christophe Leroy authored Oct 15, 2018

do_exit() already includes a test to panic() is in_interrupt()

This patch removes powerpc one which is redundant.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bd03fd84

powerpc/prom_init: Generate "phandle" instead of "linux, phandle" · f1f208e5

Benjamin Herrenschmidt authored Oct 15, 2018

When creating the boot-time FDT from an actual Open Firmware live
tree, let's generate "phandle" properties for the phandles instead
of the old deprecated "linux,phandle".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Unsplit warning printf()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f1f208e5

powerpc: Check prom_init for disallowed sections · 2c51d97e

Benjamin Herrenschmidt authored Oct 15, 2018

prom_init.c must not modify the kernel image outside
of the .bss.prominit section. Thus make sure that
prom_init.o doesn't have anything in any of these:

	.data
	.bss
	.init.data
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

2c51d97e

powerpc/prom_init: Move __prombss to it's own section and store it in .bss · 5f69e388

Benjamin Herrenschmidt authored Oct 15, 2018

This makes __prombss its own section, and for now store
it in .bss.

This will give us the ability later to store it elsewhere
and/or free it after boot (it's about 8KB).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

5f69e388

powerpc/prom_init: Move a few remaining statics to appropriate sections · 8ca2d515
Benjamin Herrenschmidt authored Oct 15, 2018
```
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
```
8ca2d515

powerpc/prom_init: Move const structures to __initconst · d00e34b9

Benjamin Herrenschmidt authored Oct 15, 2018

As they are no longer used past the end of prom_init
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d00e34b9

powerpc/prom_init: Move ibm_arch_vec to __prombss · a614f52e

Benjamin Herrenschmidt authored Oct 15, 2018

Make the existing initialized definition constant and copy
it to a __prombss copy
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a614f52e

powerpc/prom_init: Move prom_radix_disable to __prombss · c886087c

Benjamin Herrenschmidt authored Oct 15, 2018

Initialize it dynamically instead of statically
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c886087c

powerpc/prom_init: Remove support for OPAL v2 · 11fdb309

Benjamin Herrenschmidt authored Oct 15, 2018

We removed support for running under any OPAL version
earlier than v3 in 2015 (they never saw the light of day
anyway), but we kept some leftovers of this support in
prom_init.c, so let's take it out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

11fdb309

powerpc/prom_init: Replace __initdata with __prombss when applicable · e63334e5

Benjamin Herrenschmidt authored Oct 15, 2018

This replaces all occurrences of __initdata for uninitialized
data with a new __prombss

Currently __promdata is defined to be __initdata but we'll
eventually change that.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e63334e5

powerpc/pseries: Add driver for PAPR SCM regions · b5beae5e

Oliver O'Halloran authored Oct 15, 2018

Adds a driver that implements support for enabling and accessing PAPR
SCM regions. Unfortunately due to how the PAPR interface works we can't
use the existing of_pmem driver (yet) because:

 a) The guest is required to use the H_SCM_BIND_MEM h-call to add
    add the SCM region to it's physical address space, and
 b) There is currently no mechanism for relating a bare of_pmem region
    to the backing DIMM (or not-a-DIMM for our case).

Both of these are easily handled by rolling the functionality into a
seperate driver so here we are...
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b5beae5e

powerpc/pseries: PAPR persistent memory support · 4c5d87db

Oliver O'Halloran authored Oct 15, 2018

This patch implements support for discovering storage class memory
devices at boot and for handling hotplug of new regions via RTAS
hotplug events.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fix CONFIG_MEMORY_HOTPLUG=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4c5d87db

powerpc/traps: fix machine check handlers to use pr_cont() · 422123cc

Christophe Leroy authored Oct 15, 2018

When printing the machine check cause, the cause appears on the
following line due to bad use of printk without \n:

[   33.663993] Machine check in kernel mode.
[   33.664011] Caused by (from SRR1=9032):
[   33.664036] Data access error at address c90c8000

This patch fixes it by using pr_cont() for the second part:

[  133.258131] Machine check in kernel mode.
[  133.258146] Caused by (from SRR1=9032): Data access error at address c90c8000
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

422123cc

powerpc/book3e: redefine pte_mkprivileged() and pte_mkuser() · bde1a133

Christophe Leroy authored Oct 17, 2018

Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

In theorie, only pte_mkprivileged() needs to be redefined because
_PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing
to redefine both.

Fixes: a0da4bc1 ("powerpc/mm: Allow platforms to redefine some helpers")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bde1a133

powerpc/mm: Make pte_pgprot return all pte bits · b9fb4480

Aneesh Kumar K.V authored Oct 17, 2018

Other archs do the same and instead of adding required pte bits (which
got masked out) in __ioremap_at(), make sure we filter only pfn bits
out.

Fixes: 26973fa5 ("powerpc/mm: use pte helpers in generic code")
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b9fb4480

14 Oct, 2018 11 commits

powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b

Aneesh Kumar K.V authored Sep 20, 2018

Currently we limit the max addressable memory to 128TB. This patch increase the
limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
limit.

We still don't support regular system ram above 512TB. One of the challenge with
that is the percpu allocator, that allocates per node memory and use the max
distance between them as the percpu offsets. This means with large gap in
address space ( system ram above 1PB) we will run out of vmalloc space to map
the percpu allocation.

In order to support addressable memory above 512TB, kernel should be able to
linear map this range. To do that with hash translation we now add 4 context
to kernel linear map region. Our per context addressable range is 512TB. We
still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
to validate these limit.

We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4ffe713b

powerpc/mm/hash: Rename get_ea_context to get_user_context · c9f80734

Aneesh Kumar K.V authored Sep 20, 2018

We will be adding get_kernel_context later. Update function name to indicate
this handle context allocation user space address.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c9f80734

powerpc/64s/hash: Add some SLB debugging tests · e15a4fea

Nicholas Piggin authored Oct 03, 2018

This adds CONFIG_DEBUG_VM checks to ensure:
  - The kernel stack is in the SLB after it's flushed and bolted.
  - We don't insert an SLB for an address that is aleady in the SLB.
  - The kernel SLB miss handler does not take an SLB miss.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e15a4fea

powerpc/64s/hash: Simplify slb_flush_and_rebolt() · 94ee4272

Nicholas Piggin authored Oct 03, 2018

slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
it can not possibly change the stack, so it should not be touching the
shadow area. And since vmalloc is no longer bolted, it should not
change any bolted mappings at all.

Change the name to slb_flush_and_restore_bolted(), and have it just
load the kernel stack from what's currently in the shadow SLB area.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

94ee4272

powerpc/64s/hash: Add a SLB preload cache · 5434ae74

Nicholas Piggin authored Sep 15, 2018

When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

5434ae74

powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup · 425d3314

Nicholas Piggin authored Sep 15, 2018

This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

425d3314

powerpc/64s/hash: Add SLB allocation status bitmaps · 126b11b2

Nicholas Piggin authored Sep 15, 2018

Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to allocate free SLB entries first, before resorting to the round
robin allocator.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

126b11b2

powerpc/64s/hash: Convert SLB miss handlers to C · 48e7b769

Nicholas Piggin authored Sep 15, 2018

This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory must not be accessed when handling kernel
space SLB misses, so care should be taken there. However user SLB
misses can access any kernel memory, which can be used to move some
fields out of the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to
  bad address handling, etc ]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Disallow tracing for all of slb.c for now.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

48e7b769

powerpc/64: Interrupts save PPR on stack rather than thread_struct · 4c2de74c

Nicholas Piggin authored Oct 13, 2018

PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4c2de74c

powerpc/ptrace: Don't use sizeof(struct pt_regs) in ptrace code · 3eeacd9f

Michael Ellerman authored Oct 13, 2018

Now that we've split the user & kernel versions of pt_regs we need to
be more careful in the ptrace code.

For now we've ensured the location of the fields in both structs is
the same, so most of the ptrace code doesn't need updating.

But there are a few places where we use sizeof(pt_regs), and these
will be wrong as soon as we increase the size of the kernel structure.

So flip them all to use sizeof(user_pt_regs).
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3eeacd9f

powerpc: Split user/kernel definitions of struct pt_regs · 002af939

Michael Ellerman authored Oct 12, 2018

We use a shared definition for struct pt_regs in uapi/asm/ptrace.h.
That means the layout of the structure is ABI, ie. we can't change it.

That would be fine if it was only used to describe the user-visible
register state of a process, but it's also the struct we use in the
kernel to describe the registers saved in an interrupt frame.

We'd like more flexibility in the content (and possibly layout) of the
kernel version of the struct, but currently that's not possible.

So split the definition into a user-visible definition which remains
unchanged, and a kernel internal one.

At the moment they're still identical, and we check that at build
time. That's because we have code (in ptrace etc.) that assumes that
they are the same. We will fix that code in future patches, and then
we can break the strict symmetry between the two structs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

002af939