Commit 5ee121a3 authored by Palmer Dabbelt's avatar Palmer Dabbelt

Merge patch series "riscv: Apply Zawrs when available"

Andrew Jones <ajones@ventanamicro.com> says:

Zawrs provides two instructions (wrs.nto and wrs.sto), where both are
meant to allow the hart to enter a low-power state while waiting on a
store to a memory location. The instructions also both wait an
implementation-defined "short" duration (unless the implementation
terminates the stall for another reason). The difference is that while
wrs.sto will terminate when the duration elapses, wrs.nto, depending on
configuration, will either just keep waiting or an ILL exception will be
raised. Linux will use wrs.nto, so if platforms have an implementation
which falls in the "just keep waiting" category (which is not expected),
then it should _not_ advertise Zawrs in the hardware description.

Like wfi (and with the same {m,h}status bits to configure it), when
wrs.nto is configured to raise exceptions it's expected that the higher
privilege level will see the instruction was a wait instruction, do
something, and then resume execution following the instruction. For
example, KVM does configure exceptions for wfi (hstatus.VTW=1) and
therefore also for wrs.nto. KVM does this for wfi since it's better to
allow other tasks to be scheduled while a VCPU waits for an interrupt.
For waits such as those where wrs.nto/sto would be used, which are
typically locks, it is also a good idea for KVM to be involved, as it
can attempt to schedule the lock holding VCPU.

This series starts with Christoph's addition of the riscv
smp_cond_load_relaxed function which applies wrs.sto when available.
That patch has been reworked to use wrs.nto and to use the same approach
as Arm for the wait loop, since we can't have arbitrary C code between
the load-reserved and the wrs. Then, hwprobe support is added (since the
instructions are also usable from usermode), and finally KVM is
taught about wrs.nto, allowing guests to see and use the Zawrs
extension.

We still don't have test results from hardware, and it's not possible to
prove that using Zawrs is a win when testing on QEMU, not even when
oversubscribing VCPUs to guests. However, it is possible to use KVM
selftests to force a scenario where we can prove Zawrs does its job and
does it well. [4] is a test which does this and, on my machine, without
Zawrs it takes 16 seconds to complete and with Zawrs it takes 0.25
seconds.

This series is also available here [1]. In order to use QEMU for testing
a build with [2] is needed. In order to enable guests to use Zawrs with
KVM using kvmtool, the branch at [3] may be used.

[1] https://github.com/jones-drew/linux/commits/riscv/zawrs-v3/
[2] https://lore.kernel.org/all/20240312152901.512001-2-ajones@ventanamicro.com/
[3] https://github.com/jones-drew/kvmtool/commits/riscv/zawrs/
[4] https://github.com/jones-drew/linux/commit/cb2beccebcece10881db842ed69bdd5715cfab5d

Link: https://lore.kernel.org/r/20240426100820.14762-8-ajones@ventanamicro.com

* b4-shazam-merge:
  KVM: riscv: selftests: Add Zawrs extension to get-reg-list test
  KVM: riscv: Support guest wrs.nto
  riscv: hwprobe: export Zawrs ISA extension
  riscv: Add Zawrs support for spinlocks
  dt-bindings: riscv: Add Zawrs ISA extension description
  riscv: Provide a definition for 'pause'
Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
parents c9b8cd13 f2c43c61
......@@ -235,6 +235,10 @@ The following keys are defined:
supported as defined in the RISC-V ISA manual starting from commit
c732a4f39a4 ("Zcmop is ratified/1.0").
* :c:macro:`RISCV_HWPROBE_EXT_ZAWRS`: The Zawrs extension is supported as
ratified in commit 98918c844281 ("Merge pull request #1217 from
riscv/zawrs") of riscv-isa-manual.
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
information about the selected set of processors.
......
......@@ -177,6 +177,13 @@ properties:
is supported as ratified at commit 5059e0ca641c ("update to
ratified") of the riscv-zacas.
- const: zawrs
description: |
The Zawrs extension for entering a low-power state or for trapping
to a hypervisor while waiting on a store to a memory location, as
ratified in commit 98918c844281 ("Merge pull request #1217 from
riscv/zawrs") of riscv-isa-manual.
- const: zba
description: |
The standard Zba bit-manipulation extension for address generation
......
......@@ -600,6 +600,19 @@ config RISCV_ISA_V_PREEMPTIVE
preemption. Enabling this config will result in higher memory
consumption due to the allocation of per-task's kernel Vector context.
config RISCV_ISA_ZAWRS
bool "Zawrs extension support for more efficient busy waiting"
depends on RISCV_ALTERNATIVE
default y
help
The Zawrs extension defines instructions to be used in polling loops
which allow a hart to enter a low-power state or to trap to the
hypervisor while waiting on a store to a memory location. Enable the
use of these instructions in the kernel when the Zawrs extension is
detected at boot.
If you don't know what to do here, say Y.
config TOOLCHAIN_HAS_ZBB
bool
default y
......@@ -682,13 +695,6 @@ config RISCV_ISA_ZICBOZ
If you don't know what to do here, say Y.
config TOOLCHAIN_HAS_ZIHINTPAUSE
bool
default y
depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zihintpause)
depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zihintpause)
depends on LLD_VERSION >= 150000 || LD_VERSION >= 23600
config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
def_bool y
# https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc
......
......@@ -82,9 +82,6 @@ else
riscv-march-$(CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI) := $(riscv-march-y)_zicsr_zifencei
endif
# Check if the toolchain supports Zihintpause extension
riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
......
......@@ -11,6 +11,7 @@
#define _ASM_RISCV_BARRIER_H
#ifndef __ASSEMBLY__
#include <asm/cmpxchg.h>
#include <asm/fence.h>
#define nop() __asm__ __volatile__ ("nop")
......@@ -28,21 +29,6 @@
#define __smp_rmb() RISCV_FENCE(r, r)
#define __smp_wmb() RISCV_FENCE(w, w)
#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(rw, w); \
WRITE_ONCE(*p, v); \
} while (0)
#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(r, rw); \
___p1; \
})
/*
* This is a very specific barrier: it's currently only used in two places in
* the kernel, both in the scheduler. See include/linux/spinlock.h for the two
......@@ -70,6 +56,35 @@ do { \
*/
#define smp_mb__after_spinlock() RISCV_FENCE(iorw, iorw)
#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(rw, w); \
WRITE_ONCE(*p, v); \
} while (0)
#define __smp_load_acquire(p) \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(r, rw); \
___p1; \
})
#ifdef CONFIG_RISCV_ISA_ZAWRS
#define smp_cond_load_relaxed(ptr, cond_expr) ({ \
typeof(ptr) __PTR = (ptr); \
__unqual_scalar_typeof(*ptr) VAL; \
for (;;) { \
VAL = READ_ONCE(*__PTR); \
if (cond_expr) \
break; \
__cmpwait_relaxed(ptr, VAL); \
} \
(typeof(*ptr))VAL; \
})
#endif
#include <asm-generic/barrier.h>
#endif /* __ASSEMBLY__ */
......
......@@ -8,7 +8,10 @@
#include <linux/bug.h>
#include <asm/alternative-macros.h>
#include <asm/fence.h>
#include <asm/hwcap.h>
#include <asm/insn-def.h>
#define __arch_xchg_masked(prepend, append, r, p, n) \
({ \
......@@ -221,4 +224,59 @@
arch_cmpxchg_release((ptr), (o), (n)); \
})
#ifdef CONFIG_RISCV_ISA_ZAWRS
/*
* Despite wrs.nto being "WRS-with-no-timeout", in the absence of changes to
* @val we expect it to still terminate within a "reasonable" amount of time
* for an implementation-specific other reason, a pending, locally-enabled
* interrupt, or because it has been configured to raise an illegal
* instruction exception.
*/
static __always_inline void __cmpwait(volatile void *ptr,
unsigned long val,
int size)
{
unsigned long tmp;
asm goto(ALTERNATIVE("j %l[no_zawrs]", "nop",
0, RISCV_ISA_EXT_ZAWRS, 1)
: : : : no_zawrs);
switch (size) {
case 4:
asm volatile(
" lr.w %0, %1\n"
" xor %0, %0, %2\n"
" bnez %0, 1f\n"
ZAWRS_WRS_NTO "\n"
"1:"
: "=&r" (tmp), "+A" (*(u32 *)ptr)
: "r" (val));
break;
#if __riscv_xlen == 64
case 8:
asm volatile(
" lr.d %0, %1\n"
" xor %0, %0, %2\n"
" bnez %0, 1f\n"
ZAWRS_WRS_NTO "\n"
"1:"
: "=&r" (tmp), "+A" (*(u64 *)ptr)
: "r" (val));
break;
#endif
default:
BUILD_BUG();
}
return;
no_zawrs:
asm volatile(RISCV_PAUSE : : : "memory");
}
#define __cmpwait_relaxed(ptr, val) \
__cmpwait((ptr), (unsigned long)(val), sizeof(*(ptr)))
#endif
#endif /* _ASM_RISCV_CMPXCHG_H */
......@@ -92,6 +92,7 @@
#define RISCV_ISA_EXT_ZCD 83
#define RISCV_ISA_EXT_ZCF 84
#define RISCV_ISA_EXT_ZCMOP 85
#define RISCV_ISA_EXT_ZAWRS 86
#define RISCV_ISA_EXT_XLINUXENVCFG 127
......
......@@ -196,4 +196,8 @@
INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0), \
RS1(base), SIMM12(4))
#define RISCV_PAUSE ".4byte 0x100000f"
#define ZAWRS_WRS_NTO ".4byte 0x00d00073"
#define ZAWRS_WRS_STO ".4byte 0x01d00073"
#endif /* __ASM_INSN_DEF_H */
......@@ -80,6 +80,7 @@ struct kvm_vcpu_stat {
struct kvm_vcpu_stat_generic generic;
u64 ecall_exit_stat;
u64 wfi_exit_stat;
u64 wrs_exit_stat;
u64 mmio_exit_user;
u64 mmio_exit_kernel;
u64 csr_exit_user;
......
......@@ -5,6 +5,7 @@
#ifndef __ASSEMBLY__
#include <asm/barrier.h>
#include <asm/insn-def.h>
static inline void cpu_relax(void)
{
......@@ -14,16 +15,11 @@ static inline void cpu_relax(void)
__asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
#endif
#ifdef CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE
/*
* Reduce instruction retirement.
* This assumes the PC changes.
*/
__asm__ __volatile__ ("pause");
#else
/* Encoding of the pause instruction */
__asm__ __volatile__ (".4byte 0x100000F");
#endif
__asm__ __volatile__ (RISCV_PAUSE);
barrier();
}
......
......@@ -71,6 +71,7 @@ struct riscv_hwprobe {
#define RISCV_HWPROBE_EXT_ZCD (1ULL << 45)
#define RISCV_HWPROBE_EXT_ZCF (1ULL << 46)
#define RISCV_HWPROBE_EXT_ZCMOP (1ULL << 47)
#define RISCV_HWPROBE_EXT_ZAWRS (1ULL << 48)
#define RISCV_HWPROBE_KEY_CPUPERF_0 5
#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
......
......@@ -174,6 +174,7 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZCD,
KVM_RISCV_ISA_EXT_ZCF,
KVM_RISCV_ISA_EXT_ZCMOP,
KVM_RISCV_ISA_EXT_ZAWRS,
KVM_RISCV_ISA_EXT_MAX,
};
......
......@@ -347,6 +347,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
__RISCV_ISA_EXT_DATA(zimop, RISCV_ISA_EXT_ZIMOP),
__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
__RISCV_ISA_EXT_DATA(zawrs, RISCV_ISA_EXT_ZAWRS),
__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
......
......@@ -117,6 +117,7 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair,
EXT_KEY(ZCA);
EXT_KEY(ZCB);
EXT_KEY(ZCMOP);
EXT_KEY(ZAWRS);
/*
* All the following extensions must depend on the kernel
......
......@@ -25,6 +25,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, ecall_exit_stat),
STATS_DESC_COUNTER(VCPU, wfi_exit_stat),
STATS_DESC_COUNTER(VCPU, wrs_exit_stat),
STATS_DESC_COUNTER(VCPU, mmio_exit_user),
STATS_DESC_COUNTER(VCPU, mmio_exit_kernel),
STATS_DESC_COUNTER(VCPU, csr_exit_user),
......
......@@ -16,6 +16,9 @@
#define INSN_MASK_WFI 0xffffffff
#define INSN_MATCH_WFI 0x10500073
#define INSN_MASK_WRS 0xffffffff
#define INSN_MATCH_WRS 0x00d00073
#define INSN_MATCH_CSRRW 0x1073
#define INSN_MASK_CSRRW 0x707f
#define INSN_MATCH_CSRRS 0x2073
......@@ -203,6 +206,13 @@ static int wfi_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
return KVM_INSN_CONTINUE_NEXT_SEPC;
}
static int wrs_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
{
vcpu->stat.wrs_exit_stat++;
kvm_vcpu_on_spin(vcpu, vcpu->arch.guest_context.sstatus & SR_SPP);
return KVM_INSN_CONTINUE_NEXT_SEPC;
}
struct csr_func {
unsigned int base;
unsigned int count;
......@@ -378,6 +388,11 @@ static const struct insn_func system_opcode_funcs[] = {
.match = INSN_MATCH_WFI,
.func = wfi_insn,
},
{
.mask = INSN_MASK_WRS,
.match = INSN_MATCH_WRS,
.func = wrs_insn,
},
};
static int system_opcode_insn(struct kvm_vcpu *vcpu, struct kvm_run *run,
......
......@@ -42,6 +42,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(SVNAPOT),
KVM_ISA_EXT_ARR(SVPBMT),
KVM_ISA_EXT_ARR(ZACAS),
KVM_ISA_EXT_ARR(ZAWRS),
KVM_ISA_EXT_ARR(ZBA),
KVM_ISA_EXT_ARR(ZBB),
KVM_ISA_EXT_ARR(ZBC),
......@@ -132,6 +133,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_SVINVAL:
case KVM_RISCV_ISA_EXT_SVNAPOT:
case KVM_RISCV_ISA_EXT_ZACAS:
case KVM_RISCV_ISA_EXT_ZAWRS:
case KVM_RISCV_ISA_EXT_ZBA:
case KVM_RISCV_ISA_EXT_ZBB:
case KVM_RISCV_ISA_EXT_ZBC:
......
......@@ -49,6 +49,7 @@ bool filter_reg(__u64 reg)
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVPBMT:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZACAS:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZAWRS:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBA:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBB:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBC:
......@@ -421,6 +422,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
KVM_ISA_EXT_ARR(SVNAPOT),
KVM_ISA_EXT_ARR(SVPBMT),
KVM_ISA_EXT_ARR(ZACAS),
KVM_ISA_EXT_ARR(ZAWRS),
KVM_ISA_EXT_ARR(ZBA),
KVM_ISA_EXT_ARR(ZBB),
KVM_ISA_EXT_ARR(ZBC),
......@@ -951,6 +953,7 @@ KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
KVM_ISA_EXT_SIMPLE_CONFIG(svpbmt, SVPBMT);
KVM_ISA_EXT_SIMPLE_CONFIG(zacas, ZACAS);
KVM_ISA_EXT_SIMPLE_CONFIG(zawrs, ZAWRS);
KVM_ISA_EXT_SIMPLE_CONFIG(zba, ZBA);
KVM_ISA_EXT_SIMPLE_CONFIG(zbb, ZBB);
KVM_ISA_EXT_SIMPLE_CONFIG(zbc, ZBC);
......@@ -1013,6 +1016,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
&config_svnapot,
&config_svpbmt,
&config_zacas,
&config_zawrs,
&config_zba,
&config_zbb,
&config_zbc,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment