1. 29 Mar, 2021 6 commits
    • Christopher M. Riedl's avatar
      powerpc/signal64: Remove TM ifdefery in middle of if/else block · 2d19630e
      Christopher M. Riedl authored
      Both rt_sigreturn() and handle_rt_signal_64() contain TM-related ifdefs
      which break-up an if/else block. Provide stubs for the ifdef-guarded TM
      functions and remove the need for an ifdef in rt_sigreturn().
      
      Rework the remaining TM ifdef in handle_rt_signal64() similar to
      commit f1cf4f93 ("powerpc/signal32: Remove ifdefery in middle of if/else").
      
      Unlike in the commit for ppc32, the ifdef can't be removed entirely
      since uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210227011259.11992-6-cmr@codefail.de
      2d19630e
    • Christopher M. Riedl's avatar
      powerpc: Reference parameter in MSR_TM_ACTIVE() macro · 1a130b67
      Christopher M. Riedl authored
      Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
      use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
      causes an 'unused variable' compile warning unless the variable is also
      guarded with CONFIG_PPC_TRANSACTIONAL_MEM.
      
      Reference but do nothing with the argument in the macro to avoid a
      potential compile warning.
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210227011259.11992-5-cmr@codefail.de
      1a130b67
    • Christopher M. Riedl's avatar
      powerpc/signal64: Remove non-inline calls from setup_sigcontext() · c6c9645e
      Christopher M. Riedl authored
      The majority of setup_sigcontext() can be refactored to execute in an
      "unsafe" context assuming an open uaccess window except for some
      non-inline function calls. Move these out into a separate
      prepare_setup_sigcontext() function which must be called first and
      before opening up a uaccess window. Non-inline function calls should be
      avoided during a uaccess window for a few reasons:
      
      	- KUAP should be enabled for as much kernel code as possible.
      	  Opening a uaccess window disables KUAP which means any code
      	  executed during this time contributes to a potential attack
      	  surface.
      
      	- Non-inline functions default to traceable which means they are
      	  instrumented for ftrace. This adds more code which could run
      	  with KUAP disabled.
      
      	- Powerpc does not currently support the objtool UACCESS checks.
      	  All code running with uaccess must be audited manually which
      	  means: less code -> less work -> fewer problems (in theory).
      
      A follow-up commit converts setup_sigcontext() to be "unsafe".
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210227011259.11992-4-cmr@codefail.de
      c6c9645e
    • Christopher M. Riedl's avatar
      powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user() · 609355df
      Christopher M. Riedl authored
      Reuse the "safe" implementation from signal.c but call unsafe_get_user()
      directly in a loop to avoid the intermediate copy into a local buffer.
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210227011259.11992-3-cmr@codefail.de
      609355df
    • Christopher M. Riedl's avatar
      powerpc/uaccess: Add unsafe_copy_from_user() · 9466c179
      Christopher M. Riedl authored
      Use the same approach as unsafe_copy_to_user() but instead call
      unsafe_get_user() in a loop.
      Signed-off-by: default avatarChristopher M. Riedl <cmr@codefail.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210227011259.11992-2-cmr@codefail.de
      9466c179
    • Davidlohr Bueso's avatar
      powerpc/qspinlock: Use generic smp_cond_load_relaxed · deb9b13e
      Davidlohr Bueso authored
      49a7d46a (powerpc: Implement smp_cond_load_relaxed()) added
      busy-waiting pausing with a preferred SMT priority pattern, lowering
      the priority (reducing decode cycles) during the whole loop slowpath.
      
      However, data shows that while this pattern works well with simple
      spinlocks, queued spinlocks benefit more being kept in medium priority,
      with a cpu_relax() instead, being a low+medium combo on powerpc.
      
      Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with
      2 sockets and 8 threads per core.
      
      1. locktorture.
      
      This is data for the lowest and most artificial/pathological level,
      with increasing thread counts pounding on the lock. Metrics are total
      ops/minute. Despite some small hits in the 4-8 range, scenarios are
      either neutral or favorable to this patch.
      
      +=========+==========+==========+=======+
      | # tasks | vanilla  | dirty    | %diff |
      +=========+==========+==========+=======+
      | 2       | 46718565 | 48751350 | 4.35  |
      +---------+----------+----------+-------+
      | 4       | 51740198 | 50369082 | -2.65 |
      +---------+----------+----------+-------+
      | 8       | 63756510 | 62568821 | -1.86 |
      +---------+----------+----------+-------+
      | 16      | 67824531 | 70966546 | 4.63  |
      +---------+----------+----------+-------+
      | 32      | 53843519 | 61155508 | 13.58 |
      +---------+----------+----------+-------+
      | 64      | 53005778 | 53104412 | 0.18  |
      +---------+----------+----------+-------+
      | 128     | 53331980 | 54606910 | 2.39  |
      +=========+==========+==========+=======+
      
      2. sockperf (tcp throughput)
      
      Here a client will do one-way throughput tests to a localhost server, with
      increasing message sizes, dealing with the sk_lock. This patch shows to put
      the performance of the qspinlock back to par with that of the simple lock:
      
      		     simple-spinlock           vanilla			dirty
      Hmean     14        73.50 (   0.00%)       54.44 * -25.93%*       73.45 * -0.07%*
      Hmean     100      654.47 (   0.00%)      385.61 * -41.08%*      771.43 * 17.87%*
      Hmean     300     2719.39 (   0.00%)     2181.67 * -19.77%*     2666.50 * -1.94%*
      Hmean     500     4400.59 (   0.00%)     3390.77 * -22.95%*     4322.14 * -1.78%*
      Hmean     850     6726.21 (   0.00%)     5264.03 * -21.74%*     6863.12 * 2.04%*
      
      3. dbench (tmpfs)
      
      Configured to run with up to ncpusx8 clients, it shows both latency and
      throughput metrics. For the latency, with the exception of the 64 case,
      there is really nothing to go by:
      				     vanilla                dirty
      Amean     latency-1          1.67 (   0.00%)        1.67 *   0.09%*
      Amean     latency-2          2.15 (   0.00%)        2.08 *   3.36%*
      Amean     latency-4          2.50 (   0.00%)        2.56 *  -2.27%*
      Amean     latency-8          2.49 (   0.00%)        2.48 *   0.31%*
      Amean     latency-16         2.69 (   0.00%)        2.72 *  -1.37%*
      Amean     latency-32         2.96 (   0.00%)        3.04 *  -2.60%*
      Amean     latency-64         7.78 (   0.00%)        8.17 *  -5.07%*
      Amean     latency-512      186.91 (   0.00%)      186.41 *   0.27%*
      
      For the dbench4 Throughput (misleading but traditional) there's a small
      but rather constant improvement:
      
      			     vanilla                dirty
      Hmean     1        849.13 (   0.00%)      851.51 *   0.28%*
      Hmean     2       1664.03 (   0.00%)     1663.94 *  -0.01%*
      Hmean     4       3073.70 (   0.00%)     3104.29 *   1.00%*
      Hmean     8       5624.02 (   0.00%)     5694.16 *   1.25%*
      Hmean     16      9169.49 (   0.00%)     9324.43 *   1.69%*
      Hmean     32     11969.37 (   0.00%)    12127.09 *   1.32%*
      Hmean     64     15021.12 (   0.00%)    15243.14 *   1.48%*
      Hmean     512    14891.27 (   0.00%)    15162.11 *   1.82%*
      
      Measuring the dbench4 Per-VFS Operation latency, shows some very minor
      differences within the noise level, around the 0-1% ranges.
      
      Fixes: 49a7d46a ("powerpc: Implement smp_cond_load_relaxed()")
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210318204702.71417-1-dave@stgolabs.net
      deb9b13e
  2. 26 Mar, 2021 11 commits
  3. 24 Mar, 2021 12 commits
  4. 14 Mar, 2021 11 commits