1. 08 Oct, 2018 3 commits
  2. 04 Oct, 2018 9 commits
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Explicitly flush ERAT with local LPID invalidation · 053c5a75
      Nicholas Piggin authored
      Local radix TLB flush operations that operate on congruence classes
      have explicit ERAT flushes for POWER9. The process scoped LPID flush
      did not have a flush, so add it.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      053c5a75
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9 · bc276ecb
      Nicholas Piggin authored
      PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced
      with POWER9, and the result is undefined on earlier CPUs.
      
      Commits 7b9f71f9 ("powerpc/64s: POWER9 machine check handler") and
      d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on
      POWER9") caused POWER7/8 code to use this instruction. Remove it. An
      ERAT flush can be made by invalidatig the SLB, but before POWER9 that
      requires a flush and rebolt.
      
      Fixes: 7b9f71f9 ("powerpc/64s: POWER9 machine check handler")
      Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bc276ecb
    • Anton Blanchard's avatar
      powerpc/time: Add set_state_oneshot_stopped decrementer callback · 81759360
      Anton Blanchard authored
      If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to
      0x7fffffff:
      
             if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
                      set_dec(0x7fffffff);
              else
                      set_dec(decrementer_max);
      
      If there are no future events, we don't reprogram the decrementer
      after this and we end up with 0x7fffffff even on a large decrementer
      capable system.
      
      As suggested by Nick, add a set_state_oneshot_stopped callback
      so we program the decrementer with decrementer_max if there are
      no future events.
      Signed-off-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      81759360
    • Anton Blanchard's avatar
      powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer · 8b78fdb0
      Anton Blanchard authored
      We currently cap the decrementer clockevent at 4 seconds, even on systems
      with large decrementer support. Fix this by converting the code to use
      clockevents_register_device() which calculates the upper bound based on
      the max_delta passed in.
      Signed-off-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8b78fdb0
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Remove atsd_threshold debugfs setting · f86ad3e0
      Mark Hairgrove authored
      This threshold is no longer used now that all invalidates issue a single
      ATSD to each active NPU.
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f86ad3e0
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Use size-based ATSD invalidates · 3689c37d
      Mark Hairgrove authored
      Prior to this change only two types of ATSDs were issued to the NPU:
      invalidates targeting a single page and invalidates targeting the whole
      address space. The crossover point happened at the configurable
      atsd_threshold which defaulted to 2M. Invalidates that size or smaller
      would issue per-page invalidates for the whole range.
      
      The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all.
      These invalidates target addresses aligned to their size. 2M is a common
      invalidation size for GPU-enabled applications because that is a GPU
      page size, so reducing the number of invalidates by 32x in that case is a
      clear improvement.
      
      ATSD latency is high in general so now we always issue a single invalidate
      rather than multiple. This will over-invalidate in some cases, but for any
      invalidation size over 2M it matches or improves the prior behavior.
      There's also an improvement for single-page invalidates since the prior
      version issued two invalidates for that case instead of one.
      
      With this change all issued ATSDs now perform a flush, so the flush
      parameter has been removed from all the helpers.
      
      To show the benefit here are some performance numbers from a
      microbenchmark which creates a 1G allocation then uses mprotect with
      PROT_NONE to trigger invalidates in strides across the allocation.
      
      One NPU (1 GPU):
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         5.3        5.6           5%
      1M         39.3       57.4          46%
      2M         49.7       82.6          66%
      4M        286.6      285.7           0%
      
      Two NPUs (6 GPUs):
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         6.5        7.4          13%
      1M         33.4       67.9         103%
      2M         38.7       93.1         141%
      4M        356.7      354.6          -1%
      
      Anything over 2M is roughly the same as before since both cases issue a
      single ATSD.
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-By: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3689c37d
    • Mark Hairgrove's avatar
      powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates · 7ead15a1
      Mark Hairgrove authored
      There are two types of ATSDs issued to the NPU: invalidates targeting a
      specific virtual address and invalidates targeting the whole address
      space. In both cases prior to this change, the sequence was:
      
          for each NPU
              - Write the target address to the XTS_ATSD_AVA register
              - EIEIO
              - Write the launch value to issue the ATSD
      
      First, a target address is not required when invalidating the whole
      address space, so that write and the EIEIO have been removed. The AP
      (size) field in the launch is not needed either.
      
      Second, for per-address invalidates the above sequence is inefficient in
      the common case of multiple NPUs because an EIEIO is issued per NPU. This
      unnecessarily forces the launches of later ATSDs to be ordered with the
      launches of earlier ones. The new sequence only issues a single EIEIO:
      
          for each NPU
              - Write the target address to the XTS_ATSD_AVA register
          EIEIO
          for each NPU
              - Write the launch value to issue the ATSD
      
      Performance results were gathered using a microbenchmark which creates a
      1G allocation then uses mprotect with PROT_NONE to trigger invalidates in
      strides across the allocation.
      
      With only a single NPU active (one GPU) the difference is in the noise for
      both types of invalidates (+/-1%).
      
      With two NPUs active (on a 6-GPU system) the effect is more noticeable:
      
               mprotect rate (GB/s)
      Stride   Before      After      Speedup
      64K         5.9        6.5          10%
      1M         31.2       33.4           7%
      2M         36.3       38.7           7%
      4M        322.6      356.7          11%
      Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7ead15a1
    • Masahiro Yamada's avatar
      powerpc: remove leftover code of old GCC version checks · bad96de8
      Masahiro Yamada authored
      Clean up the leftover of commit f2910f0e ("powerpc: remove old
      GCC version checks").
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bad96de8
    • Daniel Axtens's avatar
      powerpc/nohash: fix undefined behaviour when testing page size support · f5e28480
      Daniel Axtens authored
      When enumerating page size definitions to check hardware support,
      we construct a constant which is (1U << (def->shift - 10)).
      
      However, the array of page size definitions is only initalised for
      various MMU_PAGE_* constants, so it contains a number of 0-initialised
      elements with def->shift == 0. This means we end up shifting by a
      very large number, which gives the following UBSan splat:
      
      ================================================================================
      UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
      shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
      Call Trace:
      [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
      [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
      [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
      [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
      [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
      [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
      ================================================================================
      
      Fix this by first checking if the element exists (shift != 0) before
      constructing the constant.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f5e28480
  3. 03 Oct, 2018 28 commits