1. 08 Oct, 2018 1 commit
  2. 04 Oct, 2018 9 commits
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Explicitly flush ERAT with local LPID invalidation · 053c5a75
      Nicholas Piggin authored
      Local radix TLB flush operations that operate on congruence classes
      have explicit ERAT flushes for POWER9. The process scoped LPID flush
      did not have a flush, so add it.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      053c5a75
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9 · bc276ecb
      Nicholas Piggin authored
      PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced
      with POWER9, and the result is undefined on earlier CPUs.
      
      Commits 7b9f71f9 ("powerpc/64s: POWER9 machine check handler") and
      d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on
      POWER9") caused POWER7/8 code to use this instruction. Remove it. An
      ERAT flush can be made by invalidatig the SLB, but before POWER9 that
      requires a flush and rebolt.
      
      Fixes: 7b9f71f9 ("powerpc/64s: POWER9 machine check handler")
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bc276ecb
    • Anton Blanchard's avatar
      powerpc/time: Add set_state_oneshot_stopped decrementer callback · 81759360
      Anton Blanchard authored
      If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to
      0x7fffffff:
      
             if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
                      set_dec(0x7fffffff);
              else
                      set_dec(decrementer_max);
      
      If there are no future events, we don't reprogram the decrementer
      after this and we end up with 0x7fffffff even on a large decrementer
      capable system.
      
      As suggested by Nick, add a set_state_oneshot_stopped callback
      so we program the decrementer with decrementer_max if there are
      no future events.
      Signed-off-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      81759360
    • Anton Blanchard's avatar
      powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer · 8b78fdb0
      Anton Blanchard authored
      We currently cap the decrementer clockevent at 4 seconds, even on systems
      with large decrementer support. Fix this by converting the code to use
      clockevents_register_device() which calculates the upper bound based on
      the max_delta passed in.
      Signed-off-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8b78fdb0
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Remove atsd_threshold debugfs setting · f86ad3e0
      Mark Hairgrove authored
      This threshold is no longer used now that all invalidates issue a single
      ATSD to each active NPU.
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f86ad3e0
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Use size-based ATSD invalidates · 3689c37d
      Mark Hairgrove authored
      Prior to this change only two types of ATSDs were issued to the NPU:
      invalidates targeting a single page and invalidates targeting the whole
      address space. The crossover point happened at the configurable
      atsd_threshold which defaulted to 2M. Invalidates that size or smaller
      would issue per-page invalidates for the whole range.
      
      The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all.
      These invalidates target addresses aligned to their size. 2M is a common
      invalidation size for GPU-enabled applications because that is a GPU
      page size, so reducing the number of invalidates by 32x in that case is a
      clear improvement.
      
      ATSD latency is high in general so now we always issue a single invalidate
      rather than multiple. This will over-invalidate in some cases, but for any
      invalidation size over 2M it matches or improves the prior behavior.
      There's also an improvement for single-page invalidates since the prior
      version issued two invalidates for that case instead of one.
      
      With this change all issued ATSDs now perform a flush, so the flush
      parameter has been removed from all the helpers.
      
      To show the benefit here are some performance numbers from a
      microbenchmark which creates a 1G allocation then uses mprotect with
      PROT_NONE to trigger invalidates in strides across the allocation.
      
      One NPU (1 GPU):
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         5.3        5.6           5%
      1M         39.3       57.4          46%
      2M         49.7       82.6          66%
      4M        286.6      285.7           0%
      
      Two NPUs (6 GPUs):
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         6.5        7.4          13%
      1M         33.4       67.9         103%
      2M         38.7       93.1         141%
      4M        356.7      354.6          -1%
      
      Anything over 2M is roughly the same as before since both cases issue a
      single ATSD.
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-By: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3689c37d
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates · 7ead15a1
      Mark Hairgrove authored
      There are two types of ATSDs issued to the NPU: invalidates targeting a
      specific virtual address and invalidates targeting the whole address
      space. In both cases prior to this change, the sequence was:
      
          for each NPU
              - Write the target address to the XTS_ATSD_AVA register
              - EIEIO
              - Write the launch value to issue the ATSD
      
      First, a target address is not required when invalidating the whole
      address space, so that write and the EIEIO have been removed. The AP
      (size) field in the launch is not needed either.
      
      Second, for per-address invalidates the above sequence is inefficient in
      the common case of multiple NPUs because an EIEIO is issued per NPU. This
      unnecessarily forces the launches of later ATSDs to be ordered with the
      launches of earlier ones. The new sequence only issues a single EIEIO:
      
          for each NPU
              - Write the target address to the XTS_ATSD_AVA register
          EIEIO
          for each NPU
              - Write the launch value to issue the ATSD
      
      Performance results were gathered using a microbenchmark which creates a
      1G allocation then uses mprotect with PROT_NONE to trigger invalidates in
      strides across the allocation.
      
      With only a single NPU active (one GPU) the difference is in the noise for
      both types of invalidates (+/-1%).
      
      With two NPUs active (on a 6-GPU system) the effect is more noticeable:
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         5.9        6.5          10%
      1M         31.2       33.4           7%
      2M         36.3       38.7           7%
      4M        322.6      356.7          11%
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7ead15a1
    • Masahiro Yamada's avatar
      powerpc: remove leftover code of old GCC version checks · bad96de8
      Masahiro Yamada authored
      Clean up the leftover of commit f2910f0e ("powerpc: remove old
      GCC version checks").
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bad96de8
    • Daniel Axtens's avatar
      powerpc/nohash: fix undefined behaviour when testing page size support · f5e28480
      Daniel Axtens authored
      When enumerating page size definitions to check hardware support,
      we construct a constant which is (1U << (def->shift - 10)).
      
      However, the array of page size definitions is only initalised for
      various MMU_PAGE_* constants, so it contains a number of 0-initialised
      elements with def->shift == 0. This means we end up shifting by a
      very large number, which gives the following UBSan splat:
      
      ================================================================================
      UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
      shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
      Call Trace:
      [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
      [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
      [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
      [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
      [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
      [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
      ================================================================================
      
      Fix this by first checking if the element exists (shift != 0) before
      constructing the constant.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f5e28480
  3. 03 Oct, 2018 30 commits
    • Christophe Leroy's avatar
      powerpc: Wire up memtest · d90fe2ac
      Christophe Leroy authored
      Add call to early_memtest() so that kernel compiled with
      CONFIG_MEMTEST really perform memtest at startup when requested
      via 'memtest' boot parameter.
      Tested-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d90fe2ac
    • Christophe Leroy's avatar
      powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak · 803d690e
      Christophe Leroy authored
      When a process allocates a hugepage, the following leak is
      reported by kmemleak. This is a false positive which is
      due to the pointer to the table being stored in the PGD
      as physical memory address and not virtual memory pointer.
      
      unreferenced object 0xc30f8200 (size 512):
        comm "mmap", pid 374, jiffies 4872494 (age 627.630s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<e32b68da>] huge_pte_alloc+0xdc/0x1f8
          [<9e0df1e1>] hugetlb_fault+0x560/0x8f8
          [<7938ec6c>] follow_hugetlb_page+0x14c/0x44c
          [<afbdb405>] __get_user_pages+0x1c4/0x3dc
          [<b8fd7cd9>] __mm_populate+0xac/0x140
          [<3215421e>] vm_mmap_pgoff+0xb4/0xb8
          [<c148db69>] ksys_mmap_pgoff+0xcc/0x1fc
          [<4fcd760f>] ret_from_syscall+0x0/0x38
      
      See commit a984506c ("powerpc/mm: Don't report PUDs as
      memory leaks when using kmemleak") for detailed explanation.
      
      To fix that, this patch tells kmemleak to ignore the allocated
      hugepage table.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      803d690e
    • Michael Neuling's avatar
      powerpc/tm: Reformat comments · 306b1c06
      Michael Neuling authored
      The comments in this file don't conform to the coding style so take
      them to "Comment Formatting Re-Education Camp".
      Suggested-by: default avatarMichael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      [mpe: Reflow some comments and add full stops, fix spelling of Sergeant.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      306b1c06
    • Petr Vorel's avatar
      powerpc/config: Enable CONFIG_PRINTK_TIME · 5bd9b444
      Petr Vorel authored
      for 64bit configs which use for CONFIG_LOG_BUF_SHIFT the same
      or higher value than the default (currently 17).
      Signed-off-by: default avatarPetr Vorel <pvorel@suse.cz>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5bd9b444
    • YueHaibing's avatar
      powerpc: Remove duplicated include from pci_32.c · 01b9870e
      YueHaibing authored
      Remove duplicated include.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      01b9870e
    • Michal Suchanek's avatar
      powerpc/64s: consolidate MCE counter increment. · 8a03e81c
      Michal Suchanek authored
      The code in machine_check_exception excludes 64s hvmode when
      incrementing the MCE counter only to call opal_machine_check to
      increment it specifically for this case.
      
      Remove the exclusion and special case.
      
      Fixes: a43c1590 ("powerpc/pseries: Flush SLB contents on SLB MCE
      		errors.")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8a03e81c
    • Breno Leitao's avatar
      powerpc/tm: Print 64-bits MSR · 51303113
      Breno Leitao authored
      On a kernel TM Bad thing program exception, the Machine State Register
      (MSR) is not being properly displayed. The exception code dumps a 32-bits
      value but MSR is a 64 bits register for all platforms that have HTM
      enabled.
      
      This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
      order to do so, the 'reason' variable could not be used, since it trimmed
      MSR to 32-bits (int).
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      51303113
    • Breno Leitao's avatar
      powerpc/tm: Remove msr_tm_active() · 5c784c84
      Breno Leitao authored
      Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
      CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
      returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
      
      This function is not necessary, since MSR_TM_ACTIVE() just do the same and
      could be used, removing the dualism and simplifying the code.
      
      This patchset remove every instance of msr_tm_active() and replaced it
      by MSR_TM_ACTIVE().
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5c784c84
    • Breno Leitao's avatar
      powerpc/powernv: Mark function as __noreturn · 62dea077
      Breno Leitao authored
      There is a mismatch between function pnv_platform_error_reboot() definition
      and declaration regarding function modifiers. In the declaration part, it
      contains the function attribute __noreturn, while function definition
      itself lacks it.
      
      This was reported by sparse tool as an error:
      
        arch/powerpc/platforms/powernv/opal.c:538:6: error: symbol 'pnv_platform_error_reboot' redeclared with different type (originally declared at arch/powerpc/platforms/powernv/powernv.h:11) - different modifiers
      
      I checked and the function is already being considered as being 'noreturn'
      by the compiler, thus, I understand this patch does not change any code
      being generated.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      62dea077
    • Breno Leitao's avatar
      selftests/powerpc: New PTRACE_SYSEMU test · fc35ef12
      Breno Leitao authored
      This patch adds a new test for the new PTRACE_SYSEMU ptrace request.
      
      This test also relies on PTRACE_GETREGS and PTRACE_SETREGS requests to
      run properly, since the trace instruction (gettid() syscall) is being
      modified at run-time (by PTRACE_SETREGS) and re-executed three times.
      PTRACE_GETREGS is being used to check that the registers are still
      sane.
      
      This test basically creates a child process that executes syscalls
      and the parent process check if it is being traced appropriately.  The
      parent process guarantees that the SYSCALLs are being traced, with
      PTRACE_SYSEMU, and ptrace stops the child application before a syscall is
      executed. The way the tests validates it, is by guaranteeing that the
      system calls arguments, as argv[0] (r3) which is the same register that
      will have the syscall return value on powerpc, are not being corrupted on
      PTRACE_SYSEMU with a return value, i.e, it continues to have the current
      arguments instead, meaning that the registers where not clobbered.
      
      This test is basically the same test for x86 located at
      tools/testing/selftests/x86/ptrace_syscall.c, limited to test PTRACE_SYSEMU
      request, and ported to PowerPC.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fc35ef12
    • Breno Leitao's avatar
      powerpc/ptrace: Add support for PTRACE_SYSEMU · 5521eb4b
      Breno Leitao authored
      This is a patch that adds support for PTRACE_SYSEMU ptrace request in
      PowerPC architecture.
      
      When ptrace(PTRACE_SYSEMU, ...) request is called, it will be handled by
      the arch independent function ptrace_resume(), which will tag the task with
      the TIF_SYSCALL_EMU flag. This flag needs to be handled from a platform
      dependent point of view, which is what this patch does.
      
      This patch adds this task's flag as part of the _TIF_SYSCALL_DOTRACE, which
      is the MACRO that is used to trace syscalls at entrance/exit.
      
      Since TIF_SYSCALL_EMU is now part of _TIF_SYSCALL_DOTRACE, if the task has
      _TIF_SYSCALL_DOTRACE set, it will hit do_syscall_trace_enter() at syscall
      entrance and do_syscall_trace_leave() at syscall leave.
      do_syscall_trace_enter() needs to handle the TIF_SYSCALL_EMU flag properly,
      which will interrupt the syscall executing if TIF_SYSCALL_EMU is set. The
      output values should not be changed, i.e. the return value (r3) should
      contain the original syscall argument on exit.
      
      With this flag set, the syscall is not executed fundamentally, because
      do_syscall_trace_enter() is returning -1 which is bigger than NR_syscall,
      thus, skipping the syscall execution and exiting userspace.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5521eb4b
    • Breno Leitao's avatar
      powerpc: Redefine TIF_32BITS thread flag · 16d7c69c
      Breno Leitao authored
      Moving TIF_32BIT to use bit 20 instead of 4 in the task flag field.
      
      This change is making room for an upcoming new task macro
      (_TIF_SYSCALL_EMU) which is preferred to set a bit in the lower 16-bits
      part of the word.
      
      This upcoming flag macro will take part in a composed macro
      (_TIF_SYSCALL_DOTRACE) which will contain other flags as well, and it is
      preferred that the whole _TIF_SYSCALL_DOTRACE macro only sets the lower 16
      bits of a word, so, it could be handled using immediate operations (as load
      immediate, add immediate, ...) where the immediate operand (SI) is limited
      to 16-bits.
      
      Another possible solution would be using the LOAD_REG_IMMEDIATE() macro
      to load a full 64-bits word immediate, but it takes 5 operations instead of
      one.
      
      Having TIF_32BITS being redefined to use an upper bit is not a problem
      since there is only one place in the assembly code where TIF_32BIT is being
      used, and it could be replaced with an operation with right shift (addis),
      since it is used alone, i.e. not being part of a composed macro, which has
      different bits set, and would require LOAD_REG_IMMEDIATE().
      
      Tested on a 64 bits Big Endian machine running a 32 bits task.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      16d7c69c
    • Christophe Leroy's avatar
      powerpc/64: add stack protector support · 06ec27ae
      Christophe Leroy authored
      On PPC64, as register r13 points to the paca_struct at all time,
      this patch adds a copy of the canary there, which is copied at
      task_switch.
      That new canary is then used by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r13
      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      06ec27ae
    • Christophe Leroy's avatar
      powerpc/32: add stack protector support · c3ff2a51
      Christophe Leroy authored
      This functionality was tentatively added in the past
      (commit 6533b7c1 ("powerpc: Initial stack protector
      (-fstack-protector) support")) but had to be reverted
      (commit f2574030 ("powerpc: Revert the initial stack
      protector support") because of GCC implementing it differently
      whether it had been built with libc support or not.
      
      Now, GCC offers the possibility to manually set the
      stack-protector mode (global or tls) regardless of libc support.
      
      This time, the patch selects HAVE_STACKPROTECTOR only if
      -mstack-protector-guard=tls is supported by GCC.
      
      On PPC32, as register r2 points to current task_struct at
      all time, the stack_canary located inside task_struct can be
      used directly by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r2
      -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary))
      
      The protector is disabled for prom_init and bootx_init as
      it is too early to handle it properly.
      
       $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT
      [  134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64
      [  134.943666]
      [  134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835
      [  134.963380] Call Trace:
      [  134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable)
      [  134.971775] [c6615dc0] [c001f654] panic+0x0/0x260
      [  134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64
      [  134.982769] [c6615e00] [ffffffff] 0xffffffff
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c3ff2a51
    • zhong jiang's avatar
      powerpc/xive: Move a dereference below a NULL test · cd5ff945
      zhong jiang authored
      Move the dereference of xc below the NULL test.
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd5ff945
    • Naveen N. Rao's avatar
      powerpc/pseries: Fix how we iterate over the DTL entries · 9258227e
      Naveen N. Rao authored
      When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in
      the lppaca to determine the number of entries in the buffer. Since
      lppaca is in big endian, we need to do an endian conversion before using
      this in our calculation to determine the number of entries in the
      buffer. Without this, we do not iterate over the existing entries in the
      DTL buffer properly.
      
      Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9258227e
    • Naveen N. Rao's avatar
      powerpc/pseries: Fix DTL buffer registration · db787af1
      Naveen N. Rao authored
      When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL
      buffer for a cpu when the associated file under powerpc/dtl in debugfs
      is opened. When doing so, we need to set the size of the buffer being
      registered in the second u32 word of the buffer. This needs to be in big
      endian, but we are not doing the conversion resulting in the below error
      showing up in dmesg:
      
      	dtl_start: DTL registration for cpu 0 (hw 0) failed with -4
      
      Fix this in the obvious manner.
      
      Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      db787af1
    • Christophe Leroy's avatar
      powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception() · 51423a9c
      Christophe Leroy authored
      PPC32 uses nonrecoverable_exception() while PPC64 uses
      unrecoverable_exception().
      
      Both functions are doing almost the same thing.
      
      This patch removes nonrecoverable_exception()
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      51423a9c
    • Rob Herring's avatar
      powerpc: Convert to using %pOFn instead of device_node.name · b9ef7b4b
      Rob Herring authored
      In preparation to remove the node name pointer from struct device_node,
      convert printf users to use the %pOFn format specifier.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b9ef7b4b
    • Rob Herring's avatar
      macintosh: Convert to using %pOFn instead of device_node.name · 0bdba867
      Rob Herring authored
      In preparation to remove the node name pointer from struct device_node,
      convert printf users to use the %pOFn format specifier.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0bdba867
    • Rob Herring's avatar
      powerpc/pseries: Use of_irq_get helper() in request_event_sources_irqs() · c417596d
      Rob Herring authored
      Instead of calling both of_irq_parse_one() and
      irq_create_of_mapping(), call of_irq_get() instead which does
      essentially the same thing. of_irq_get() also calls irq_find_host()
      for deferred probe support, but this should be fine as
      irq_create_of_mapping() also calls that internally. This gets us
      closer to making the former 2 functions static.
      
      In the process of simplifying request_event_sources_irqs(), combine
      the the pr_err() and WARN_ON() calls to just a WARN().
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c417596d
    • Rob Herring's avatar
      powerpc/cell: Use irq_of_parse_and_map() helper · 8c8933eb
      Rob Herring authored
      Instead of calling both of_irq_parse_one() and
      irq_create_of_mapping(), call of_irq_parse_and_map() instead which
      does the same thing. This gets us closer to making the former 2
      functions static.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8c8933eb
    • Aneesh Kumar K.V's avatar
    • Aneesh Kumar K.V's avatar
      powerpc/mm/thp: update pmd_trans_huge to check for pmd_present · 8890e033
      Aneesh Kumar K.V authored
      We need to make sure pmd_trans_huge returns false for a pmd migration entry.
      We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the
      _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make
      sure we check for pmd_present() so that pmd_trans_huge won't return true on
      pmd migration entry.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8890e033
    • Aneesh Kumar K.V's avatar
      arch/powerpc/mm/hash: validate the pte entries before handling the hash fault · 75646c48
      Aneesh Kumar K.V authored
      Make sure we are operating on THP and hugetlb entries in the respective hash
      fault handling routines.
      
      No functional change in this patch. If we walked the table wrongly before, we
      will retry the access.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      75646c48
    • Aneesh Kumar K.V's avatar
      powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge · ae28f17b
      Aneesh Kumar K.V authored
      Update few code paths to check for pmd_large.
      
      set_pmd_at:
      We want to use this to store swap pte at pmd level. For swap ptes we don't want
      to set H_PAGE_THP_HUGE. Hence check for pmd_large in set_pmd_at. This remove
      the false WARN_ON when using this with swap pmd entry.
      
      pmd_page:
      We don't really use them on pmd migration entries. But they can also work with
      migration entries and we don't differentiate at the pte level. Hence update
      pmd_page to work with pmd migration entries too
      
      __find_linux_pte:
      lockless page table walk need to handle pmd migration entries. pmd_trans_huge
      check will return false on them. We don't set thp = 1 for such entries, but
      update hpage_shift correctly. Without this we will walk pmd migration entries
      as a pte page pointer which is wrong.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ae28f17b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer. · f1981b5b
      Aneesh Kumar K.V authored
      This make hugetlb directory pointer similar to other page able entries. A hugepd
      entry is identified by lack of _PAGE_PTE bit set and directory size stored in
      HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f1981b5b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit · da7ad366
      Aneesh Kumar K.V authored
      With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
      pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.
      
      With pmd_present, we have a special case. We need to make sure we consider a
      pmd marked invalid during THP split as present. Right now we clear the
      _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
      case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
      only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
      for this special case. pmd_present is also updated to look at _PAGE_INVALID.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      da7ad366
    • Vaibhav Jain's avatar
      powerpc/powernv: Make possible for user to force a full ipl cec reboot · 8139046a
      Vaibhav Jain authored
      Ever since fast reboot is enabled by default in opal,
      opal_cec_reboot() will use fast-reset instead of full IPL to perform
      system reboot. This leaves the user with no direct way to force a full
      IPL reboot except changing an nvram setting that persistently disables
      fast-reset for all subsequent reboots.
      
      This patch provides a more direct way for the user to force a one-shot
      full IPL reboot by passing the command line argument 'full' to the
      reboot command. So the user will be able to tweak the reboot behavior
      via:
      
        $ sudo reboot full	# Force a full ipl reboot skipping fast-reset
      
        or
        $ sudo reboot  	# default reboot path (usually fast-reset)
      
      The reboot command passes the un-parsed command argument to the kernel
      via the 'Reboot' syscall which is then passed on to the arch function
      pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
      and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
      IPL reset.
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8139046a
    • Michael Ellerman's avatar
      powerpc/perf: Add missing break in power7_marked_instr_event() · db6711b7
      Michael Ellerman authored
      In power7_marked_instr_event() there is a switch case that is missing
      a break or an explicit fallthrough, it's not immediately clear which
      it should be.
      
      The function determines based on the PMU event code, whether the event
      is a "marked" event (which then requires us to configure the PMU in a
      certain way). On Power7 there is no specific bit(s) in the event to
      tell us that, we just have to know.
      
      Rather than having a full list of every event and whether they are
      marked, we pull apart the event code and for events with certain
      values of certain fields we can say that those are all marked events.
      
      We take the psel (bits 0-7) of the event, and look at bits 4-7. For a
      value of 6 we say that if the entire psel == 0x64 then if the pmc == 3
      the event is marked, else not, and otherwise we continue.
      
      It is then that we fallthrough to the 8 case, where we return true if
      the unit == 0xd.
      
      The question is should the 6 case also fallthrough and check for
      unit == 0xd, or should it return.
      
      Looking at the full list of events we see that there are zero events
      where (psel >> 4) == 0x6 and unit == 0xd.
      
      So the answer is it doesn't really matter, there are no valid event
      codes that will return a different result whether we fallthrough or
      break.
      
      But equally, testing the 6 case events against unit == 0xd is slightly
      bogus, as there are no such events. So to make the code clearer, and
      avoid any future confusion, have the 6 case break rather than falling
      through.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      db6711b7