Commits · 6eeb9b3b9ce588f14a697737a30d0702b5a20293 · Kirill Smelkov / linux

25 Mar, 2020 8 commits

powerpc/64s: Fix section mismatch warnings from boot code · 6eeb9b3b

Michael Ellerman authored Feb 25, 2020

We currently have two section mismatch warnings:

  The function __boot_from_prom() references
  the function __init prom_init().

  The function start_here_common() references
  the function __init start_kernel().

The warnings are correct, we do have branches from non-init code into
init code, which is freed after boot. But we don't expect to ever
execute any of that early boot code after boot, if we did that would
be a bug. In particular calling into OF after boot would be fatal
because OF is no longer resident.

So for now fix the warnings by marking the relevant functions as
__REF, which puts them in the ".ref.text" section.

This causes some reordering of the functions in the final link:

  @@ -217,10 +217,9 @@
   c00000000000b088 t generic_secondary_common_init
   c00000000000b124 t __mmu_off
   c00000000000b14c t __start_initialization_multiplatform
  -c00000000000b1ac t __boot_from_prom
  -c00000000000b1ec t __after_prom_start
  -c00000000000b260 t p_end
  -c00000000000b27c T copy_and_flush
  +c00000000000b1ac t __after_prom_start
  +c00000000000b220 t p_end
  +c00000000000b23c T copy_and_flush
   c00000000000b300 T __secondary_start
   c00000000000b300 t copy_to_here
   c00000000000b344 t start_secondary_prolog
  @@ -228,8 +227,9 @@
   c00000000000b36c t enable_64b_mode
   c00000000000b388 T relative_toc
   c00000000000b3a8 t p_toc
  -c00000000000b3b0 t start_here_common
  -c00000000000b3d0 t start_here_multiplatform
  +c00000000000b3b0 t __boot_from_prom
  +c00000000000b3f0 t start_here_multiplatform
  +c00000000000b480 t start_here_common
   c00000000000b880 T system_call_common
   c00000000000b974 t system_call
   c00000000000b9dc t system_call_exit

In particular __boot_from_prom moves after copy_to_here, which means
it's not copied to zero in the first stage of copy of the kernel to
zero.

But that's OK, because we only call __boot_from_prom before we do the
copy, so it makes no difference when it's copied. The call sequence
is:
  __start
  -> __start_initialization_multiplatform
     -> __boot_from_prom
        -> __start
           -> __start_initialization_multiplatform
              -> __after_prom_start
                 -> copy_and_flush
                 -> copy_and_flush (relocated to 0)
                    -> start_here_multiplatform
                       -> early_setup
Reported-by: Mauricio Faria de Oliveira <mauricfo@linux.ibm.com>
Reported-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225031328.14676-1-mpe@ellerman.id.au

6eeb9b3b

powerpc/xmon: Lower limits on nidump and ndump · d64c7dbb

Michael Ellerman authored Feb 19, 2020

In xmon we have two variables that are used by the dump commands.
There's ndump which is the number of bytes to dump using 'd', and
nidump which is the number of instructions to dump using 'di'.

ndump starts as 64 and nidump starts as 16, but both can be set by the
user.

It's fairly common to be pasting addresses into xmon when trying to
debug something, and if you inadvertently double paste an address like
so:

  0:mon> di c000000002101f6c c000000002101f6c

The second value is interpreted as the number of instructions to dump.

Luckily it doesn't dump 13 quintrillion instructions, the value is
limited to MAX_DUMP (128K). But as each instruction is dumped on a
single line, that's still a lot of output. If you're on a slow console
that can take multiple minutes to print. If you were "just popping in
and out of xmon quickly before the RCU/hardlockup detector fires" you
are now having a bad day.

Things are not as bad with 'd' because we print 16 bytes per line, so
it's fewer lines. But it's still quite a lot.

So shrink the maximum for 'd' to 64K (one page), which is 4096 lines.
For 'di' add a new limit which is the above / 4 - because instructions
are 4 bytes, meaning again we can dump one page.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200219110007.31195-1-mpe@ellerman.id.au

d64c7dbb

powerpc/prom_init: Pass the "os-term" message to hypervisor · 74bb84e5

Alexey Kardashevskiy authored Mar 12, 2020

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930b ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200312074404.87293-1-aik@ozlabs.ru

74bb84e5

powerpc: Replace setup_irq() by request_irq() · b4f00d5b

afzal mohammed authored Mar 12, 2020

request_irq() is preferred over setup_irq(). Invocations of setup_irq()
occur after memory allocators are ready.

Per tglx[1], setup_irq() existed in olden days when allocators were not
ready by the time early interrupts were initialized.

Hence replace setup_irq() by request_irq().

[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanosSigned-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200312064256.18735-1-afzal.mohd.ma@gmail.com

b4f00d5b

powerpc/cell: Use fallthrough; · addf3727

Joe Perches authored Mar 10, 2020

Convert the various uses of fallthrough comments to fallthrough;
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/03073a9a269010ca439e9e658629c44602b0cc9f.1583896348.git.joe@perches.com

addf3727

powerpc/sstep: Fix DS operand in ld encoding to appropriate value · 3e74a0e1

Balamuruhan S authored Mar 11, 2020

ld instruction should have 14 bit immediate field (DS) concatenated
with 0b00 on the right, encode it accordingly. Introduce macro
`IMM_DS()` to encode DS form instructions with 14 bit immediate field.

Fixes: 4ceae137 ("powerpc: emulate_step() tests for load/store instructions")
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200311102405.392263-1-bala24@linux.ibm.com

3e74a0e1

powerpc/pseries: Fix of_read_drc_info_cell() to point at next record · c5e76fa0

Tyrel Datwyler authored Mar 06, 2020

The expectation is that when calling of_read_drc_info_cell()
repeatedly to parse multiple drc-info records that the in/out curval
parameter points at the start of the next record on return. However,
the current behavior has curval still pointing at the final value of
the record just parsed. The result of which is that if the
ibm,drc-info property contains multiple properties the parsed value
of the drc_type for any record after the first has the power_domain
value of the previous record appended to the type string.

eg: observed the following 0xffffffff prepended to PHB

  drc-info: type: \xff\xff\xff\xffPHB, prefix: PHB , index_start: 0x20000001
  drc-info: suffix_start: 1, sequential_elems: 3072, sequential_inc: 1
  drc-info: power-domain: 0xffffffff, last_index: 0x20000c00

In practice PHBs are the only type of connector in the ibm,drc-info
property that has multiple records. So, it breaks PHB hotplug, but by
chance not PCI, CPU, slot, or memory because they happen to only ever
be a single record.

Fix by incrementing curval past the power_domain value to point at
drc_type string of next record.

Fixes: e83636ac ("pseries/drc-info: Search DRC properties for CPU indexes")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200307024547.5748-1-tyreld@linux.ibm.com

c5e76fa0

selftests/powerpc: Don't rely on segfault to rerun the test · 0f8f554e

Gustavo Luiz Duarte authored Feb 11, 2020

The test case tm-signal-context-force-tm expects a segfault to happen
on returning from signal handler, and then does a setcontext() to run
the test again. However, the test doesn't always segfault, causing the
test to run a single time.

This patch fixes the test by putting it within a loop and jumping, via
setcontext, just prior to the loop in case it segfaults. This way we
get the desired behavior (run the test COUNT_MAX times) regardless if
it segfaults or not. This also reduces the use of setcontext for
control flow logic, keeping it only in the segfault handler.

Also, since 'count' is changed within the signal handler, it is
declared as volatile to prevent any compiler optimization getting
confused with asynchronous changes.
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-3-gustavold@linux.ibm.com

0f8f554e

20 Mar, 2020 3 commits

selftests/powerpc: Add tm-signal-pagefault test · 915b7f6f

Gustavo Luiz Duarte authored Feb 11, 2020

This test triggers a TM Bad Thing by raising a signal in transactional state
and forcing a pagefault to happen in kernelspace when the kernel signal
handling code first touches the user signal stack.

This is inspired by the test tm-signal-context-force-tm but uses userfaultfd to
make the test deterministic. While this test always triggers the bug in one
run, I had to execute tm-signal-context-force-tm several times (the test runs
5000 times each execution) to trigger the same bug.

tm-signal-context-force-tm is kept instead of replaced because, while this test
is more reliable and triggers the same bug, tm-signal-context-force-tm has a
better coverage, in the sense that by running the test several times it might
trigger the pagefault and/or be preempted at different places.

v3: skip test if userfaultfd is unavailable.
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-2-gustavold@linux.ibm.com

915b7f6f

powerpc/kuap: PPC_KUAP_DEBUG should depend on PPC_KUAP · 61da50b7

Michael Ellerman authored Mar 01, 2020

Currently you can enable PPC_KUAP_DEBUG when PPC_KUAP is disabled,
even though the former has not effect without the latter.

Fix it so that PPC_KUAP_DEBUG can only be enabled when PPC_KUAP is
enabled, not when the platform could support KUAP (PPC_HAVE_KUAP).

Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200301111738.22497-1-mpe@ellerman.id.au

61da50b7

selftests/powerpc: Add a test of sigreturn vs VDSO · a0968a02

Michael Ellerman authored Mar 04, 2020

There's two different paths through the sigreturn code, depending on
whether the VDSO is mapped or not. We recently discovered a bug in the
unmapped case, because it's not commonly used these days.

So add a test that sends itself a signal, then moves the VDSO, takes
another signal and finally unmaps the VDSO before sending itself
another signal. That tests the standard signal path, the code that
handles the VDSO being moved, and also the signal path in the case
where the VDSO is unmapped.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200304110402.6038-1-mpe@ellerman.id.au

a0968a02

17 Mar, 2020 7 commits

powerpc/lib: Fix emulate_step() std test · 59ed2adf

Nicholas Piggin authored Feb 26, 2020

We should be checking that the instruction was stepped *and* that the
target register has the right value.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
[mpe: Write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200226055302.1577954-1-npiggin@gmail.com

59ed2adf

powerpc/64s/radix: Fix CONFIG_SMP=n build · 993cfecc

Nicholas Piggin authored Mar 02, 2020

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200302010410.2957362-1-npiggin@gmail.com

993cfecc

selftests/powerpc: Add tlbie_test in .gitignore · 47bf235f

Christophe Leroy authored Feb 28, 2020

The commit identified below added tlbie_test but forgot to add it in
.gitignore.

Fixes: 93cad5f7 ("selftests/powerpc: Add test case for tlbie vs mtpidr ordering issue")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/259f9c06ed4563c4fa4fa8ffa652347278d769e7.1582847784.git.christophe.leroy@c-s.fr

47bf235f

powerpc/pmac/smp: Drop unnecessary volatile qualifier · a4037d1f

YueHaibing authored Mar 03, 2020

core99_l2_cache/core99_l3_cache do not need to be marked as volatile,
remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303085604.24952-1-yuehaibing@huawei.com

a4037d1f

powerpc/pmac/smp: Avoid unused-variable warnings · 9451c79b

Ilie Halip authored Sep 20, 2019

When building with ppc64_defconfig, the compiler reports
that these 2 variables are not used:
    warning: unused variable 'core99_l2_cache' [-Wunused-variable]
    warning: unused variable 'core99_l3_cache' [-Wunused-variable]

They are only used when CONFIG_PPC64 is not defined. Move
them into a section which does the same macro check.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Ilie Halip <ilie.halip@gmail.com>
[mpe: Move them into core99_init_caches() which is their only user]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190920153951.25762-1-ilie.halip@gmail.com

9451c79b

powerpc/fsl_booke: Avoid creating duplicate tlb1 entry · aa411334

Laurentiu Tudor authored Jan 23, 2020

In the current implementation, the call to loadcam_multi() is wrapped
between switch_to_as1() and restore_to_as0() calls so, when it tries
to create its own temporary AS=1 TLB1 entry, it ends up duplicating
the existing one created by switch_to_as1(). Add a check to skip
creating the temporary entry if already running in AS=1.

Fixes: d9e1831a ("powerpc/85xx: Load all early TLB entries at once")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200123111914.2565-1-laurentiu.tudor@nxp.com

aa411334

tty: evh_bytechan: Fix out of bounds accesses · 3670664b

Stephen Rothwell authored Jan 09, 2020

ev_byte_channel_send() assumes that its third argument is a 16 byte
array. Some places where it is called it may not be (or we can't
easily tell if it is). Newer compilers have started producing warnings
about this, so make sure we actually pass a 16 byte array.

There may be more elegant solutions to this, but the driver is quite
old and hasn't been updated in many years.

The warnings (from a powerpc allyesconfig build) are:

  In file included from include/linux/byteorder/big_endian.h:5,
                   from arch/powerpc/include/uapi/asm/byteorder.h:14,
                   from include/asm-generic/bitops/le.h:6,
                   from arch/powerpc/include/asm/bitops.h:250,
                   from include/linux/bitops.h:29,
                   from include/linux/kernel.h:12,
                   from include/asm-generic/bug.h:19,
                   from arch/powerpc/include/asm/bug.h:109,
                   from include/linux/bug.h:5,
                   from include/linux/mmdebug.h:5,
                   from include/linux/gfp.h:5,
                   from include/linux/slab.h:15,
                   from drivers/tty/ehv_bytechan.c:24:
  drivers/tty/ehv_bytechan.c: In function ‘ehv_bc_udbg_putc’:
  arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 is outside array bounds of ‘const char[1]’ [-Warray-bounds]
    298 |  r6 = be32_to_cpu(p[1]);
  include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of macro ‘__be32_to_cpu’
     40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
        |                                                   ^
  arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro ‘be32_to_cpu’
    298 |  r6 = be32_to_cpu(p[1]);
        |       ^~~~~~~~~~~
  drivers/tty/ehv_bytechan.c:166:13: note: while referencing ‘data’
    166 | static void ehv_bc_udbg_putc(char c)
        |             ^~~~~~~~~~~~~~~~

Fixes: dcd83aaf ("tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[mpe: Trim warnings from change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200109183912.5fcb52aa@canb.auug.org.au

3670664b

13 Mar, 2020 4 commits

cpufreq: powernv: Fix unsafe notifiers · 966c08de

Oliver O'Halloran authored Feb 06, 2020

The PowerNV cpufreq driver registers two notifiers: one to catch
throttle messages from the OCC and one to bump the CPU frequency back
to normal before a reboot. Both require the cpufreq driver to be
registered in order to function since the notifier callbacks use
various cpufreq_*() functions.

Right now we register both notifiers before we've initialised the
driver. This seems to work, but we should head off any protential
problems by registering the notifiers after the driver is initialised.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200206062622.28235-2-oohall@gmail.com

966c08de

cpufreq: powernv: Fix use-after-free · d0a72efa

Oliver O'Halloran authored Feb 06, 2020

The cpufreq driver has a use-after-free that we can hit if:

a) There's an OCC message pending when the notifier is registered, and
b) The cpufreq driver fails to register with the core.

When a) occurs the notifier schedules a workqueue item to handle the
message. The backing work_struct is located on chips[].throttle and
when b) happens we clean up by freeing the array. Once we get to
the (now free) queued item and the kernel crashes.

Fixes: c5e29ea7 ("cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200206062622.28235-1-oohall@gmail.com

d0a72efa

powerpc/vdso: remove deprecated VDS64_HAS_DESCRIPTORS references · ffd3eaf1

Joe Lawrence authored Feb 24, 2020

The original 2005 patch that introduced the powerpc vdso, pre-git
("ppc64: Implement a vDSO and use it for signal trampoline") notes that:

  ... symbols exposed by the vDSO aren't "normal" function symbols, apps
  can't be expected to link against them directly, the vDSO's are both
  seen as if they were linked at 0 and the symbols just contain offsets
  to the various functions.  This is done on purpose to avoid a
  relocation step (ppc64 functions normally have descriptors with abs
  addresses in them).  When glibc uses those functions, it's expected to
  use it's own trampolines that know how to reach them.

Despite that explanation, there remains dead #ifdef
VDS64_HAS_DESCRIPTORS code-blocks that provide alternate function
definitions that setup function descriptors.

Since VDS64_HAS_DESCRIPTORS has been unused for all these years, we
might as well finally remove it from the codebase.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200224211848.26087-1-joe.lawrence@redhat.com

ffd3eaf1

powerpc/32: Fix missing NULL pmd check in virt_to_kpte() · cc6f0e39

Christophe Leroy authored Mar 07, 2020

Commit 2efc7c08 ("powerpc/32: drop get_pteptr()"),
replaced get_pteptr() by virt_to_kpte(). But virt_to_kpte() lacks a
NULL pmd check and returns an invalid non NULL pointer when there
is no page table.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Fixes: 2efc7c08 ("powerpc/32: drop get_pteptr()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b1177cdfc6af74a3e277bba5d9e708c4b3315ebe.1583575707.git.christophe.leroy@c-s.fr

cc6f0e39

10 Mar, 2020 1 commit

Merge branch 'fixes' into next · 819723a8

Michael Ellerman authored Mar 10, 2020

Merge in our fixes branch. In particular we want to merge the TM and KUAP fixes,
so we can add selftests for them in next.

819723a8

05 Mar, 2020 1 commit

powerpc/mm: Fix missing KUAP disable in flush_coherent_icache() · 59bee45b

Michael Ellerman authored Mar 03, 2020

Stefan reported a strange kernel fault which turned out to be due to a
missing KUAP disable in flush_coherent_icache() called from
flush_icache_range().

The fault looks like:

  Kernel attempted to access user page (7fffc30d9c00) - exploit attempt? (uid: 1009)
  BUG: Unable to handle kernel data access on read at 0x7fffc30d9c00
  Faulting instruction address: 0xc00000000007232c
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 35 PID: 5886 Comm: sigtramp Not tainted 5.6.0-rc2-gcc-8.2.0-00003-gfc37a163 #79
  NIP:  c00000000007232c LR: c00000000003b7fc CTR: 0000000000000000
  REGS: c000001e11093940 TRAP: 0300   Not tainted  (5.6.0-rc2-gcc-8.2.0-00003-gfc37a163)
  MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000884  XER: 00000000
  CFAR: c0000000000722fc DAR: 00007fffc30d9c00 DSISR: 08000000 IRQMASK: 0
  GPR00: c00000000003b7fc c000001e11093bd0 c0000000023ac200 00007fffc30d9c00
  GPR04: 00007fffc30d9c18 0000000000000000 c000001e11093bd4 0000000000000000
  GPR08: 0000000000000000 0000000000000001 0000000000000000 c000001e1104ed80
  GPR12: 0000000000000000 c000001fff6ab380 c0000000016be2d0 4000000000000000
  GPR16: c000000000000000 bfffffffffffffff 0000000000000000 0000000000000000
  GPR20: 00007fffc30d9c00 00007fffc30d8f58 00007fffc30d9c18 00007fffc30d9c20
  GPR24: 00007fffc30d9c18 0000000000000000 c000001e11093d90 c000001e1104ed80
  GPR28: c000001e11093e90 0000000000000000 c0000000023d9d18 00007fffc30d9c00
  NIP flush_icache_range+0x5c/0x80
  LR  handle_rt_signal64+0x95c/0xc2c
  Call Trace:
    0xc000001e11093d90 (unreliable)
    handle_rt_signal64+0x93c/0xc2c
    do_notify_resume+0x310/0x430
    ret_from_except_lite+0x70/0x74
  Instruction dump:
  409e002c 7c0802a6 3c62ff31 3863f6a0 f8010080 48195fed 60000000 48fe4c8d
  60000000 e8010080 7c0803a6 7c0004ac <7c00ffac> 7c0004ac 4c00012c 38210070

This path through handle_rt_signal64() to setup_trampoline() and
flush_icache_range() is only triggered by 64-bit processes that have
unmapped their VDSO, which is rare.

flush_icache_range() takes a range of addresses to flush. In
flush_coherent_icache() we implement an optimisation for CPUs where we
know we don't actually have to flush the whole range, we just need to
do a single icbi.

However we still execute the icbi on the user address of the start of
the range we're flushing. On CPUs that also implement KUAP (Power9)
that leads to the spurious fault above.

We should be able to pass any address, including a kernel address, to
the icbi on these CPUs, which would avoid any interaction with KUAP.
But I don't want to make that change in a bug fix, just in case it
surfaces some strange behaviour on some CPU.

So for now just disable KUAP around the icbi. Note the icbi is treated
as a load, so we allow read access, not write as you'd expect.

Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303235708.26004-1-mpe@ellerman.id.au

59bee45b

04 Mar, 2020 16 commits

powerpc/numa: Remove late request for home node associativity · 247257b0

Srikar Dronamraju authored Jan 29, 2020

With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea62630 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.

Hence remove the late request for home node associativity.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-6-srikar@linux.vnet.ibm.com

247257b0

powerpc/numa: Early request for home node associativity · dc909d8b

Srikar Dronamraju authored Jan 29, 2020

Currently the kernel detects if its running on a shared lpar platform
and requests home node associativity before the scheduler sched_domains
are setup. However between the time NUMA setup is initialized and the
request for home node associativity, workqueue initializes its per node
cpumask. The per node workqueue possible cpumask may turn invalid
after home node associativity resulting in weird situations like
workqueue possible cpumask being a subset of workqueue online cpumask.

This can be fixed by requesting home node associativity earlier just
before NUMA setup. However at the NUMA setup time, kernel may not be in
a position to detect if its running on a shared lpar platform. So
request for home node associativity and if the request fails, fallback
on the device tree property.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-5-srikar@linux.vnet.ibm.com

dc909d8b

powerpc/numa: Use cpu node map of first sibling thread · 413e4055

Srikar Dronamraju authored Jan 29, 2020

All the sibling threads of a core have to be part of the same node.
To ensure that all the sibling threads map to the same node, always
lookup/update the cpu-to-node map of the first thread in the core.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-4-srikar@linux.vnet.ibm.com

413e4055

powerpc/numa: Handle extra hcall_vphn error cases · 76b7bfb1

Srikar Dronamraju authored Jan 29, 2020

Currently code handles H_FUNCTION, H_SUCCESS, H_HARDWARE return codes.
However hcall_vphn can return other return codes. Now it also handles
H_PARAMETER return code.  Also the rest return codes are handled under the
default case.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-3-srikar@linux.vnet.ibm.com

76b7bfb1

powerpc/vphn: Check for error from hcall_vphn · e7214ae9

Srikar Dronamraju authored Jan 29, 2020

There is no value in unpacking associativity, if
H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-2-srikar@linux.vnet.ibm.com

e7214ae9

powerpc/smp: Use nid as fallback for package_id · a05f0e5b

Srikar Dronamraju authored Jan 29, 2020

package_id is to match cores that are part of the same chip. On
PowerNV machines, package_id defaults to chip_id. However ibm,chip_id
property is not present in device-tree of PowerVM LPARs. Hence lscpu
output shows one core per socket and multiple cores.

To overcome this, use nid as the package_id on PowerVM LPARs.

Before the patch:

  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8
  Core(s) per socket:  1                     <----------------------
  Socket(s):           16                    <----------------------
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127
  #
  # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
  -1

After the patch:

  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8                     <---------------------
  Core(s) per socket:  8                     <---------------------
  Socket(s):           2
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127
  #
  # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
  0

Now lscpu output is more in line with the system configuration.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Use pkg_id instead of ppid, tweak change log and comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135121.24617-1-srikar@linux.vnet.ibm.com

a05f0e5b

powerpc/irq: Use current_stack_pointer in do_IRQ() · 532d43a7

Christophe Leroy authored Feb 20, 2020

Until commit 7306e83c ("powerpc: Don't use CURRENT_THREAD_INFO to
find the stack"), the current stack base address was obtained by
calling current_thread_info(). That inline function was simply masking
out the value of r1.

In that commit, it was changed to using current_stack_pointer() (since
renamed current_stack_frame()), which is a heavier function as it is
an outline assembly function which cannot be inlined and which reads
the content of the stack at 0(r1).

Convert to using current_stack_pointer for geting r1 and masking out
its value to obtain the base address of the stack pointer as before.

Fixes: 7306e83c ("powerpc: Don't use CURRENT_THREAD_INFO to find the stack")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-5-mpe@ellerman.id.au

532d43a7

powerpc/irq: use IS_ENABLED() in check_stack_overflow() · 0dec6e1c

Christophe Leroy authored Feb 20, 2020

Instead of #ifdef, use IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW).
This enable GCC to check for code validity even when the option
is not selected.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-4-mpe@ellerman.id.au

0dec6e1c

powerpc/irq: Use current_stack_pointer in check_stack_overflow() · 84ab1489

Christophe Leroy authored Feb 20, 2020

The purpose of check_stack_overflow() is to verify that the stack has
not overflowed.

To really know whether the stack pointer is still within boundaries,
the check must be done directly on the value of r1.

So use current_stack_pointer, which returns the current value of r1,
rather than current_stack_frame() which causes a frame to be created
and then returns that value.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-3-mpe@ellerman.id.au

84ab1489

powerpc: Add current_stack_pointer as a register global · 0e63f015

Christophe Leroy authored Feb 20, 2020

current_stack_frame() doesn't return the stack pointer, but the
caller's stack frame. See commit bfe9a2cf ("powerpc: Reimplement
__get_SP() as a function not a define") and commit
acf620ec ("powerpc: Rename __get_SP() to current_stack_pointer()")
for details.

In some cases this is overkill or incorrect, as it doesn't return the
current value of r1.

So add a current_stack_pointer register global to get the value of r1
directly.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Split out of other patch, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-2-mpe@ellerman.id.au

0e63f015

powerpc: Rename current_stack_pointer() to current_stack_frame() · 3d13e839

Michael Ellerman authored Feb 20, 2020

current_stack_pointer(), which was called __get_SP(), used to just
return the value in r1.

But that caused problems in some cases, so it was turned into a
function in commit bfe9a2cf ("powerpc: Reimplement __get_SP() as a
function not a define").

Because it's a function in a separate compilation unit to all its
callers, it has the effect of causing a stack frame to be created, and
then returns the address of that frame. This is good in some cases
like those described in the above commit, but in other cases it's
overkill, we just need to know what stack page we're on.

On some other arches current_stack_pointer is just a register global
giving the stack pointer, and we'd like to do that too. So rename our
current_stack_pointer() to current_stack_frame() to make that
possible.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200220115141.2707-1-mpe@ellerman.id.au

3d13e839

powerpc/kernel/sysfs: Add new config option PMU_SYSFS to enable PMU SPRs sysfs file creation · 22697da3

Kajol Jain authored Feb 14, 2020

Many of the performance monitoring unit (PMU) SPRs are
exposed in the sysfs. This may not be a desirable since
"perf" API is the primary interface to program PMU and
collect counter data in the system. But that said, we
cant remove these sysfs files since we dont whether
anyone/anything is using them.

So the patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'
(user selectable) to be used in sysfs file creation for
PMU SPRs. New option by default is disabled, but can be
enabled if user needs it.

Tested this patch behaviour in powernv and pseries machines.
Patch is also tested for pmac32_defconfig.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry <nasastry@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200214080606.26872-2-kjain@linux.ibm.com

22697da3

powerpc/kernel/sysfs: Refactor current sysfs.c · fcdb524d

Madhavan Srinivasan authored Feb 14, 2020

An attempt to refactor the current sysfs.c file.
To start with a big chuck of macro #defines and dscr
functions are moved to start of the file. Secondly,
HAS_ #define macros are cleanup based on CONFIG_ options

Finally new HAS_ macro added:
1. HAS_PPC_PA6T (for PA6T) to separate out non-PMU SPRs.
2. HAS_PPC_PMC56 to separate out PMC SPR's from HAS_PPC_PMC_CLASSIC
   which come under CONFIG_PPC64.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200214080606.26872-1-kjain@linux.ibm.com

fcdb524d

powerpc/powernv: Add explicit fast-reboot support · 672e480a

Oliver O'Halloran authored Feb 17, 2020

Add a way to manually invoke a fast-reboot rather than setting the NVRAM
flag. The idea is to allow userspace to invoke a fast-reboot using the
optional string argument to the reboot() system call, or using the xmon
zr command so we don't need to leave around a persistent changes on
a system to use the feature.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200217024833.30580-2-oohall@gmail.com

672e480a

powerpc/powernv: Treat an empty reboot string as default · 16985f2d

Oliver O'Halloran authored Feb 17, 2020

Treat an empty reboot cmd string the same as a NULL string. This squashes a
spurious unsupported reboot message that sometimes gets out when using
xmon.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200217024833.30580-1-oohall@gmail.com

16985f2d

powerpc/Makefile: Mark phony targets as PHONY · d42c6d0f

Michael Ellerman authored Feb 19, 2020

Some of our phony targets are not marked as such. This can lead to
confusing errors, eg:

  $ make clean
  $ touch install
  $ make install
  make: 'install' is up to date.
  $

Fix it by adding them to the PHONY variable which is marked phony in
the top-level Makefile, or in scripts/Makefile.build for the boot
Makefile.
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200219000434.15872-1-mpe@ellerman.id.au

d42c6d0f