1. 21 Dec, 2018 22 commits
    • Firoz Khan's avatar
      powerpc: split compat syscall table out from native table · fbf508da
      Firoz Khan authored
      PowerPC uses a syscall table with native and compat calls
      interleaved, which is a slightly simpler way to define two
      matching tables.
      
      As we move to having the tables generated, that advantage
      is no longer important, but the interleaved table gets in
      the way of using the same scripts as on the other archit-
      ectures.
      
      Split out a new compat_sys_call_table symbol that contains
      all the compat calls, and leave the main table for the nat-
      ive calls, to more closely match the method we use every-
      where else.
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fbf508da
    • Firoz Khan's avatar
      powerpc: move macro definition from asm/systbl.h · a11b763d
      Firoz Khan authored
      Move the macro definition for compat_sys_sigsuspend from
      asm/systbl.h to the file which it is getting included.
      
      One of the patch in this patch series is generating uapi
      header and syscall table files. In order to come up with
      a common implimentation across all architecture, we need
      to do this change.
      
      This change will simplify the implementation of system
      call table generation script and help to come up a common
      implementation across all architecture.
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a11b763d
    • Firoz Khan's avatar
      powerpc: add __NR_syscalls along with NR_syscalls · 8a19eeea
      Firoz Khan authored
      NR_syscalls macro holds the number of system call exist
      in powerpc architecture. We have to change the value of
      NR_syscalls, if we add or delete a system call.
      
      One of the patch in this patch series has a script which
      will generate a uapi header based on syscall.tbl file.
      The syscall.tbl file contains the number of system call
      information. So we have two option to update NR_syscalls
      value.
      
      1. Update NR_syscalls in asm/unistd.h manually by count-
         ing the no.of system calls. No need to update NR_sys-
         calls until we either add a new system call or delete
         existing system call.
      
      2. We can keep this feature in above mentioned script,
         that will count the number of syscalls and keep it in
         a generated file. In this case we don't need to expli-
         citly update NR_syscalls in asm/unistd.h file.
      
      The 2nd option will be the recommended one. For that, I
      added the __NR_syscalls macro in uapi/asm/unistd.h along
      with NR_syscalls asm/unistd.h. The macro __NR_syscalls
      also added for making the name convention same across all
      architecture. While __NR_syscalls isn't strictly part of
      the uapi, having it as part of the generated header to
      simplifies the implementation. We also need to enclose
      this macro with #ifdef __KERNEL__ to avoid side effects.
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8a19eeea
    • Ram Pai's avatar
      powerpc/pkeys: Fix handling of pkey state across fork() · 2cd4bd19
      Ram Pai authored
      Protection key tracking information is not copied over to the
      mm_struct of the child during fork(). This can cause the child to
      erroneously allocate keys that were already allocated. Any allocated
      execute-only key is lost aswell.
      
      Add code; called by dup_mmap(), to copy the pkey state from parent to
      child explicitly.
      
      This problem was originally found by Dave Hansen on x86, which turns
      out to be a problem on powerpc aswell.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Reviewed-by: default avatarThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2cd4bd19
    • Greg Kurz's avatar
      ocxl: Fix endiannes bug in read_afu_name() · 2f07229f
      Greg Kurz authored
      The AFU Descriptor Template in the PCI config space has a Name Space
      field which is a 24 Byte ASCII character string of descriptive name
      space for the AFU. The OCXL driver read the string four characters at
      a time with pci_read_config_dword().
      
      This optimization is valid on a little-endian system since this is PCI,
      but a big-endian system ends up with each subset of four characters in
      reverse order.
      
      This could be fixed by switching to read characters one by one. Another
      option is to swap the bytes if we're big-endian.
      
      Go for the latter with le32_to_cpu().
      
      Cc: stable@vger.kernel.org      # v4.16
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2f07229f
    • Breno Leitao's avatar
      selftests/powerpc: Add checks for transactional sigreturn · 34642d70
      Breno Leitao authored
      This is a new test case that creates a signal and starts a suspended
      transaction inside the signal handler.
      
      It returns from the signal handler with the CPU at suspended state, but
      without setting user context MSR Transaction State (TS) field.
      
      The kernel signal handler code should be able to handle this discrepancy
      instead of crashing.
      
      This code could be compiled and used to test 32 and 64-bits signal
      handlers.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      34642d70
    • Breno Leitao's avatar
      powerpc/tm: Unset MSR[TS] if not recheckpointing · 6f5b9f01
      Breno Leitao authored
      There is a TM Bad Thing bug that can be caused when you return from a
      signal context in a suspended transaction but with ucontext MSR[TS] unset.
      
      This forces regs->msr[TS] to be set at syscall entrance (since the CPU
      state is transactional). It also calls treclaim() to flush the transaction
      state, which is done based on the live (mfmsr) MSR state.
      
      Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
      called, thus, not executing recheckpoint, keeping the CPU state as not
      transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
      state is non transactional, causing the TM Bad Thing with the following
      stack:
      
      	[   33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c
      	cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40]
      	    pc: c00000000000c47c: fast_exception_return+0xac/0xb4
      	    lr: 00003fff865f442c
      	    sp: 3fffd9dce3e0
      	   msr: 8000000102a03031
      	  current = 0xc00000041f68b700
      	  paca    = 0xc00000000fb84800   softe: 0        irq_happened: 0x01
      	    pid   = 1721, comm = tm-signal-sigre
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	WARNING: exception is not recoverable, can't continue
      
      The same problem happens on 32-bits signal handler, and the fix is very
      similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be
      zeroed.
      
      This patch also fixes a sparse warning related to lack of indentation when
      CONFIG_PPC_TRANSACTIONAL_MEM is set.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      CC: Stable <stable@vger.kernel.org>	# 3.10+
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Tested-by: default avatarMichal Suchánek <msuchanek@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6f5b9f01
    • Breno Leitao's avatar
      powerpc/tm: Print scratch value · 11be3958
      Breno Leitao authored
      Usually a TM Bad Thing exception is raised due to three different problems.
      a) touching SPRs in an active transaction; b) using TM instruction with the
      facility disabled and c) setting a wrong MSR/SRR1 at RFID.
      
      The two initial cases are easy to identify by looking at the instructions.
      The latter case is harder, because the MSR is masked after RFID, so, it is
      very useful to look at the previous MSR (SRR1) before RFID as also the
      current and masked MSR.
      
      Since MSR is saved at paca just before RFID, this patch prints it if a TM
      Bad thing happen, helping to understand what is the invalid TM transition
      that is causing the exception.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      11be3958
    • Breno Leitao's avatar
      powerpc/tm: Save MSR to PACA before RFID · 63a0d6b0
      Breno Leitao authored
      As other exit points, move SRR1 (MSR) into paca->tm_scratch, so, if
      there is a TM Bad Thing in RFID, it is easy to understand what was the
      SRR1 value being used.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      63a0d6b0
    • Breno Leitao's avatar
      powerpc/tm: Set MSR[TS] just prior to recheckpoint · e1c3743e
      Breno Leitao authored
      On a signal handler return, the user could set a context with MSR[TS] bits
      set, and these bits would be copied to task regs->msr.
      
      At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
      several __get_user() are called and then a recheckpoint is executed.
      
      This is a problem since a page fault (in kernel space) could happen when
      calling __get_user(). If it happens, the process MSR[TS] bits were
      already set, but recheckpoint was not executed, and SPRs are still invalid.
      
      The page fault can cause the current process to be de-scheduled, with
      MSR[TS] active and without tm_recheckpoint() being called.  More
      importantly, without TEXASR[FS] bit set also.
      
      Since TEXASR might not have the FS bit set, and when the process is
      scheduled back, it will try to reclaim, which will be aborted because of
      the CPU is not in the suspended state, and, then, recheckpoint. This
      recheckpoint will restore thread->texasr into TEXASR SPR, which might be
      zero, hitting a BUG_ON().
      
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
      	    pc: c000000000054550: restore_gprs+0xb0/0x180
      	    lr: 0000000000000000
      	    sp: c00000041f157950
      	   msr: 8000000100021033
      	  current = 0xc00000041f143000
      	  paca    = 0xc00000000fb86300	 softe: 0	 irq_happened: 0x01
      	    pid   = 1021, comm = kworker/11:1
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	enter ? for help
      	[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
      	[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
      	[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
      	[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
      	[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
      	[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
      	[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
      
      This patch simply delays the MSR[TS] set, so, if there is any page fault in
      the __get_user() section, it does not have regs->msr[TS] set, since the TM
      structures are still invalid, thus avoiding doing TM operations for
      in-kernel exceptions and possible process reschedule.
      
      With this patch, the MSR[TS] will only be set just before recheckpointing
      and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
      invalid state.
      
      Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
      after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
      be atomic from a preemption perspective, thus, calling
      preempt_disable/enable() on this code.
      
      It is not possible to move tm_recheckpoint to happen earlier, because it is
      required to get the checkpointed registers from userspace, with
      __get_user(), thus, the only way to avoid this undesired behavior is
      delaying the MSR[TS] set.
      
      The 32-bits signal handler seems to be safe this current issue, but, it
      might be exposed to the preemption issue, thus, disabling preemption in
      this chunk of code.
      
      Changes from v2:
       * Run the critical section with preempt_disable.
      
      Fixes: 87b4e539 ("powerpc/tm: Fix return of active 64bit signals")
      Cc: stable@vger.kernel.org (v3.9+)
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e1c3743e
    • Mahesh Salgaonkar's avatar
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area. · 0db6896f
      Mahesh Salgaonkar authored
      For fadump to work successfully there should not be any holes in reserved
      memory ranges where kernel has asked firmware to move the content of old
      kernel memory in event of crash. Now that fadump uses CMA for reserved
      area, this memory area is now not protected from hot-remove operations
      unless it is cma allocated. Hence, fadump service can fail to re-register
      after the hot-remove operation, if hot-removed memory belongs to fadump
      reserved region. To avoid this make sure that memory from fadump reserved
      area is not hot-removable if fadump is registered.
      
      However, if user still wants to remove that memory, he can do so by
      manually stopping fadump service before hot-remove operation.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0db6896f
    • Mahesh Salgaonkar's avatar
      powerpc/fadump: Throw proper error message on fadump registration failure · f86593be
      Mahesh Salgaonkar authored
      fadump fails to register when there are holes in reserved memory area.
      This can happen if user has hot-removed a memory that falls in the
      fadump reserved memory area. Throw a meaningful error message to the
      user in such case.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      [mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f86593be
    • Mahesh Salgaonkar's avatar
      powerpc/fadump: Reservationless firmware assisted dump · a4e92ce8
      Mahesh Salgaonkar authored
      One of the primary issues with Firmware Assisted Dump (fadump) on Power
      is that it needs a large amount of memory to be reserved. On large
      systems with TeraBytes of memory, this reservation can be quite
      significant.
      
      In some cases, fadump fails if the memory reserved is insufficient, or
      if the reserved memory was DLPAR hot-removed.
      
      In the normal case, post reboot, the preserved memory is filtered to
      extract only relevant areas of interest using the makedumpfile tool.
      While the tool provides flexibility to determine what needs to be part
      of the dump and what memory to filter out, all supported distributions
      default this to "Capture only kernel data and nothing else".
      
      We take advantage of this default and the Linux kernel's Contiguous
      Memory Allocator (CMA) to fundamentally change the memory reservation
      model for fadump.
      
      Instead of setting aside a significant chunk of memory nobody can use,
      this patch uses CMA instead, to reserve a significant chunk of memory
      that the kernel is prevented from using (due to MIGRATE_CMA), but
      applications are free to use it. With this fadump will still be able
      to capture all of the kernel memory and most of the user space memory
      except the user pages that were present in CMA region.
      
      Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
      [root@zzxx-yy10 ~]# free -m
                    total        used        free      shared  buff/cache   available
      Mem:           7557         193        6822          12         541        6725
      Swap:          4095           0        4095
      
      With this patch:
      [root@zzxx-yy10 ~]# free -m
                    total        used        free      shared  buff/cache   available
      Mem:           8133         194        7464          12         475        7338
      Swap:          4095           0        4095
      
      Changes made here are completely transparent to how fadump has
      traditionally worked.
      
      Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
      CMA and its usage.
      
      TODO:
      - Handle case where CMA reservation spans nodes.
      Signed-off-by: default avatarAnanth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a4e92ce8
    • Mahesh Salgaonkar's avatar
      powerpc/powernv: Move opal_power_control_init() call in opal_init(). · 08fb726d
      Mahesh Salgaonkar authored
      opal_power_control_init() depends on opal message notifier to be
      initialized, which is done in opal_init()->opal_message_init(). But both
      these initialization are called through machine initcalls and it all
      depends on in which order they being called. So far these are called in
      correct order (may be we got lucky) and never saw any issue. But it is
      clearer to control initialization order explicitly by moving
      opal_power_control_init() into opal_init().
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      08fb726d
    • Markus Elfring's avatar
      powerpc/4xx: Delete an unnecessary return statement in two functions · ae6263cc
      Markus Elfring authored
      The script "checkpatch.pl" pointed information out like the following.
      
      WARNING: void function return statements are not generally useful
      
      Thus remove such a statement in the affected functions.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ae6263cc
    • Markus Elfring's avatar
      powerpc/4xx: Delete error message for a ENOMEM in two functions · a8d5dada
      Markus Elfring authored
      Omit an extra message for a memory allocation failure in these
      functions.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a8d5dada
    • Markus Elfring's avatar
      powerpc/4xx: Use seq_putc() in ocm_debugfs_show() · 52930bc6
      Markus Elfring authored
      A single character (line break) should be put into a sequence.
      Thus use the corresponding function "seq_putc".
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      52930bc6
    • Markus Elfring's avatar
      powerpc/4xx: Combine four seq_printf() calls into two in ocm_debugfs_show() · b52106a0
      Markus Elfring authored
      Some data were printed into a sequence by four separate function calls.
      Print the same data by two single function calls instead.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b52106a0
    • Christophe Leroy's avatar
      powerpc/8xx: Allow pinning IMMR TLB when using early debug console · 96d19d70
      Christophe Leroy authored
      CONFIG_EARLY_DEBUG_CPM requires IMMR area TLB to be pinned
      otherwise it doesn't survive MMU_init, and the boot fails.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      96d19d70
    • Oliver O'Halloran's avatar
      powerpc/powernv: Remove PCI_MSI ifdef checks · 5f639e5f
      Oliver O'Halloran authored
      CONFIG_PCI_MSI was made mandatory by commit a311e738
      ("powerpc/powernv: Make PCI non-optional") so the #ifdef
      checks around CONFIG_PCI_MSI here can be removed entirely.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5f639e5f
    • Alexandre Belloni's avatar
      powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved" · a0837876
      Alexandre Belloni authored
      Fix a spelling mistake in a register description.
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a0837876
    • Ravi Bangoria's avatar
      Powerpc/perf: Wire up PMI throttling · 0c9108b0
      Ravi Bangoria authored
      Commit 14c63f17 ("perf: Drop sample rate when sampling is too
      slow") introduced a way to throttle PMU interrupts if we're spending
      too much time just processing those. Wire up powerpc PMI handler to
      use this infrastructure.
      
      We have throttling of the *rate* of interrupts, but this adds
      throttling based on the *time taken* to process the interrupts.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0c9108b0
  2. 20 Dec, 2018 18 commits