An error occurred fetching the project authors.
- 08 Feb, 2024 3 commits
-
-
Paolo Bonzini authored
KVM uses __KVM_HAVE_* symbols in the architecture-dependent uapi/asm/kvm.h to mask unused definitions in include/uapi/linux/kvm.h. __KVM_HAVE_READONLY_MEM however was nothing but a misguided attempt to define KVM_CAP_READONLY_MEM only on architectures where KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM) could possibly return nonzero. This however does not make sense, and it prevented userspace from supporting this architecture-independent feature without recompilation. Therefore, these days __KVM_HAVE_READONLY_MEM does not mask anything and is only used in virt/kvm/kvm_main.c. Userspace does not need to test it and there should be no need for it to exist. Remove it and replace it with a Kconfig symbol within Linux source code. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
While this in principle breaks userspace code that mentions KVM_ARM_DEV_* on architectures other than aarch64, this seems unlikely to be a problem considering that run->s.regs.device_irq_level is only defined on that architecture. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Change uapi header uses of GENMASK to instead use the uapi/linux/bits.h bit macros, since GENMASK is not defined in uapi headers. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 04 Oct, 2023 1 commit
-
-
Jing Zhang authored
While the Feature ID range is well defined and pretty large, it isn't inconceivable that the architecture will eventually grow some other ranges that will need to similarly be described to userspace. Add a VM ioctl to allow userspace to get writable masks for feature ID registers in below system register space: op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7} This is used to support mix-and-match userspace and kernels for writable ID registers, where userspace may want to know upfront whether it can actually tweak the contents of an idreg or not. Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that returns a bitmap of the valid ranges, which can subsequently be retrieved, one at a time by setting the index of the set bit as the range identifier. Suggested-by:
Marc Zyngier <maz@kernel.org> Suggested-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Jing Zhang <jingzhangos@google.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Reviewed-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.devSigned-off-by:
Oliver Upton <oliver.upton@linux.dev>
-
- 05 Apr, 2023 4 commits
-
-
Marc Zyngier authored
When returning to userspace to handle a SMCCC call, we consistently set PC to point to the instruction immediately after the HVC/SMC. However, should userspace need to know the exact address of the trapping instruction, it needs to know about the *size* of that instruction. For AArch64, this is pretty easy. For AArch32, this is a bit more funky, as Thumb has 16bit encodings for both HVC and SMC. Expose this to userspace with a new flag that directly derives from ESR_EL2.IL. Also update the documentation to reflect the PC state at the point of exit. Finally, this fixes a small buglet where the hypercall.{args,ret} fields would not be cleared on exit, and could contain some random junk. Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/86pm8iv8tj.wl-maz@kernel.org
-
Oliver Upton authored
As the SMCCC (and related specifications) march towards an 'everything and the kitchen sink' interface for interacting with a system it becomes less likely that KVM will support every related feature. We could do better by letting userspace have a crack at it instead. Allow userspace to define an 'SMCCC filter' that applies to both HVCs and SMCs initiated by the guest. Supporting both conduits with this interface is important for a couple of reasons. Guest SMC usage is table stakes for a nested guest, as HVCs are always taken to the virtual EL2. Additionally, guests may want to interact with a service on the secure side which can now be proxied by userspace. Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-10-oliver.upton@linux.dev
-
Oliver Upton authored
In anticipation of user hypercall filters, add the necessary plumbing to get SMCCC calls out to userspace. Even though the exit structure has space for KVM to pass register arguments, let's just avoid it altogether and let userspace poke at the registers via KVM_GET_ONE_REG. This deliberately stretches the definition of a 'hypercall' to cover SMCs from EL1 in addition to the HVCs we know and love. KVM doesn't support EL1 calls into secure services, but now we can paint that as a userspace problem and be done with it. Finally, we need a flag to let userspace know what conduit instruction was used (i.e. SMC vs. HVC). Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-9-oliver.upton@linux.dev
-
Oliver Upton authored
KVM presently allows userspace to filter guest hypercalls with bitmaps expressed via pseudo-firmware registers. These bitmaps have a narrow scope and, of course, can only allow/deny a particular call. A subsequent change to KVM will introduce a generalized UAPI for filtering hypercalls, allowing functions to be forwarded to userspace. Refactor the existing hypercall filtering logic to make room for more than two actions. While at it, generalize the function names around SMCCC as it is the basis for the upcoming UAPI. No functional change intended. Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-7-oliver.upton@linux.dev
-
- 30 Mar, 2023 2 commits
-
-
Marc Zyngier authored
Emulating EL2 also means emulating the EL2 timers. To do so, we expand our timer framework to deal with at most 4 timers. At any given time, two timers are using the HW timers, and the two others are purely emulated. The role of deciding which is which at any given time is left to a mapping function which is called every time we need to make such a decision. Reviewed-by:
Colton Lewis <coltonlewis@google.com> Co-developed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org
-
Marc Zyngier authored
And this is the moment you have all been waiting for: setting the counter offset from userspace. We expose a brand new capability that reports the ability to set the offset for both the virtual and physical sides. In keeping with the architecture, the offset is expressed as a delta that is substracted from the physical counter value. Once this new API is used, there is no going back, and the counters cannot be written to to set the offsets implicitly (the writes are instead ignored). Reviewed-by:
Colton Lewis <coltonlewis@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-8-maz@kernel.org
-
- 11 Feb, 2023 1 commit
-
-
Christoffer Dall authored
Introduce the feature bit and a primitive that checks if the feature is set behind a static key check based on the cpus_have_const_cap check. Checking vcpu_has_nv() on systems without nested virt enabled should have negligible overhead. We don't yet allow userspace to actually set this feature. Reviewed-by:
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-4-maz@kernel.orgSigned-off-by:
Oliver Upton <oliver.upton@linux.dev>
-
- 10 Nov, 2022 1 commit
-
-
Gavin Shan authored
Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
-
- 10 Aug, 2022 1 commit
-
-
Yang Yingliang authored
Use GENMASK() to generate the masks of device type and device id, fixing compilation errors due to the sign extension when using older versions of GCC (such as is 7.5): In function ‘kvm_vm_ioctl_set_device_addr.isra.38’, inlined from ‘kvm_arch_vm_ioctl’ at arch/arm64/kvm/arm.c:1454:10: ././include/linux/compiler_types.h:354:38: error: call to ‘__compiletime_assert_599’ \ declared with attribute error: FIELD_GET: mask is not constant _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) Fixes: 9f968c92 ("KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting") Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> [maz: tidy up commit message] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220810013435.1525363-1-yangyingliang@huawei.com
-
- 15 May, 2022 1 commit
-
-
Marc Zyngier authored
These constants will change over time, and userspace has no business knowing about them. Hide them behind __KERNEL__. Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
- 03 May, 2022 3 commits
-
-
Raghavendra Rao Ananta authored
Introduce the firmware register to hold the vendor specific hypervisor service calls (owner value 6) as a bitmap. The bitmap represents the features that'll be enabled for the guest, as configured by the user-space. Currently, this includes support for KVM-vendor features along with reading the UID, represented by bit-0, and Precision Time Protocol (PTP), represented by bit-1. Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> [maz: tidy-up bitmap values] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
-
Raghavendra Rao Ananta authored
Introduce the firmware register to hold the standard hypervisor service calls (owner value 5) as a bitmap. The bitmap represents the features that'll be enabled for the guest, as configured by the user-space. Currently, this includes support only for Paravirtualized time, represented by bit-0. Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> [maz: tidy-up bitmap values] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
-
Raghavendra Rao Ananta authored
KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can observe hypercall services in and out as they migrate across various host kernel versions. This could be a major problem if the guest discovered a hypercall, started using it, and after getting migrated to an older kernel realizes that it's no longer available. Depending on how the guest handles the change, there's a potential chance that the guest would just panic. As a result, there's a need for the userspace to elect the services that it wishes the guest to discover. It can elect these services based on the kernels spread across its (migration) fleet. To remedy this, extend the existing firmware pseudo-registers, such as KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space for all the hypercall services available. These firmware registers are categorized based on the service call owners, but unlike the existing firmware pseudo-registers, they hold the features supported in the form of a bitmap. During the VM initialization, the registers are set to upper-limit of the features supported by the corresponding registers. It's expected that the VMMs discover the features provided by each register via GET_ONE_REG, and write back the desired values using SET_ONE_REG. KVM allows this modification only until the VM has started. Some of the standard features are not mapped to any bits of the registers. But since they can recreate the original problem of making it available without userspace's consent, they need to be explicitly added to the case-list in kvm_hvc_call_default_allowed(). Any function-id that's not enabled via the bitmap, or not listed in kvm_hvc_call_default_allowed, will be returned as SMCCC_RET_NOT_SUPPORTED to the guest. Older userspace code can simply ignore the feature and the hypercall services will be exposed unconditionally to the guests, thus ensuring backward compatibility. In this patch, the framework adds the register only for ARM's standard secure services (owner value 4). Currently, this includes support only for ARM True Random Number Generator (TRNG) service, with bit-0 of the register representing mandatory features of v1.0. Other services are momentarily added in the upcoming patches. Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> [maz: reduced the scope of some helpers, tidy-up bitmap max values, dropped error-only fast path] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
-
- 29 Apr, 2022 1 commit
-
-
Alexandru Elisei authored
When userspace is debugging a VM, the kvm_debug_exit_arch part of the kvm_run struct contains arm64 specific debug information: the ESR_EL2 value, encoded in the field "hsr", and the address of the instruction that caused the exception, encoded in the field "far". Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately kvm_debug_exit_arch.hsr cannot be changed because that would change the memory layout of the struct on big endian machines: Current layout: | Layout with "hsr" extended to 64 bits: | offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32]) offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0]) offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far) which breaks existing code. The padding is inserted by the compiler because the "far" field must be aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1], page 18), and the struct itself must be aligned to 8 bytes (the struct must be aligned to the maximum alignment of its fields - aapcs64, page 18), which means that "hsr" must be aligned to 8 bytes as it is the first field in the struct. To avoid changing the struct size and layout for the existing fields, add a new field, "hsr_high", which replaces the existing padding. "hsr_high" will be used to hold the ESR_EL2[61:32] bits of the register. The memory layout, both on big and little endian machine, becomes: offset 0: ESR_EL2[31:0] (hsr) offset 4: ESR_EL2[61:32] (hsr_high) offset 8: FAR_EL2[61:0] (far) The padding that the compiler inserts for the current struct layout is unitialized. To prevent an updated userspace running on an old kernel mistaking the padding for a valid "hsr_high" value, add a new flag, KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that "hsr_high" holds a valid ESR_EL2[61:32] value. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdfSigned-off-by:
Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- 24 Feb, 2022 1 commit
-
-
James Morse authored
KVM allows the guest to discover whether the ARCH_WORKAROUND SMCCC are implemented, and to preserve that state during migration through its firmware register interface. Add the necessary boiler plate for SMCCC_ARCH_WORKAROUND_3. Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
- 21 Feb, 2022 1 commit
-
-
Will Deacon authored
When handling reset and power-off PSCI calls from the guest, we initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to re-run the vCPU after issuing the call. Unfortunately, this also means that the VMM cannot see which PSCI call was issued and therefore cannot distinguish between PSCI SYSTEM_RESET and SYSTEM_RESET2 calls, which is necessary in order to determine the validity of the "reset_type" in X1. Allocate bit 0 of the previously unused 'flags' field of the system_event structure so that we can indicate the PSCI call used to initiate the reset. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Will Deacon <will@kernel.org> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
-
- 08 Feb, 2022 2 commits
-
-
Alexandru Elisei authored
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU device ioctl. If the VCPU is scheduled on a physical CPU which has a different PMU, the perf events needed to emulate a guest PMU won't be scheduled in and the guest performance counters will stop counting. Treat it as an userspace error and refuse to run the VCPU in this situation. Suggested-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
-
Alexandru Elisei authored
When KVM creates an event and there are more than one PMUs present on the system, perf_init_event() will go through the list of available PMUs and will choose the first one that can create the event. The order of the PMUs in this list depends on the probe order, which can change under various circumstances, for example if the order of the PMU nodes change in the DTB or if asynchronous driver probing is enabled on the kernel command line (with the driver_async_probe=armv8-pmu option). Another consequence of this approach is that on heteregeneous systems all virtual machines that KVM creates will use the same PMU. This might cause unexpected behaviour for userspace: when a VCPU is executing on the physical CPU that uses this default PMU, PMU events in the guest work correctly; but when the same VCPU executes on another CPU, PMU events in the guest will suddenly stop counting. Fortunately, perf core allows user to specify on which PMU to create an event by using the perf_event_attr->type field, which is used by perf_init_event() as an index in the radix tree of available PMUs. Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU attribute to allow userspace to specify the arm_pmu that KVM will use when creating events for that VCPU. KVM will make no attempt to run the VCPU on the physical CPUs that share the PMU, leaving it up to userspace to manage the VCPU threads' affinity accordingly. To ensure that KVM doesn't expose an asymmetric system to the guest, the PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run, the PMU cannot be changed in order to avoid changing the list of available events for a VCPU, or to change the semantics of existing events. Signed-off-by:
Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
-
- 22 Jun, 2021 1 commit
-
-
Steven Price authored
The VMM may not wish to have it's own mapping of guest memory mapped with PROT_MTE because this causes problems if the VMM has tag checking enabled (the guest controls the tags in physical RAM and it's unlikely the tags are correct for the VMM). Instead add a new ioctl which allows the VMM to easily read/write the tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE while the VMM can still read/write the tags for the purpose of migration. Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Steven Price <steven.price@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
-
- 27 Nov, 2020 1 commit
-
-
Will Deacon authored
'struct kvm_arch_memory_slot' isn't part of the user ABI, so move it out of the uapi/ headers in case we start using it in future and accidentally back ourselves into a corner. Signed-off-by:
Will Deacon <will@kernel.org> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201118194402.2892-2-will@kernel.org
-
- 29 Sep, 2020 2 commits
-
-
Marc Zyngier authored
Owing to the fact that the host kernel is always mitigated, we can drastically simplify the WA2 handling by keeping the mitigation state ON when entering the guest. This means the guest is either unaffected or not mitigated. This results in a nice simplification of the mitigation space, and the removal of a lot of code that was never really used anyway. Signed-off-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Will Deacon <will@kernel.org>
-
Marc Zyngier authored
It can be desirable to expose a PMU to a guest, and yet not want the guest to be able to count some of the implemented events (because this would give information on shared resources, for example. For this, let's extend the PMUv3 device API, and offer a way to setup a bitmap of the allowed events (the default being no bitmap, and thus no filtering). Userspace can thus allow/deny ranges of event. The default policy depends on the "polarity" of the first filter setup (default deny if the filter allows events, and default allow if the filter denies events). This allows to setup exactly what is allowed for a given guest. Note that although the ioctl is per-vcpu, the map of allowed events is global to the VM (it can be setup from any vcpu until the vcpu PMU is initialized). Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
- 23 Jan, 2020 1 commit
-
-
Andrew Jones authored
Two UAPI system register IDs do not derive their values from the ARM system register encodings. This is because their values were accidentally swapped. As the IDs are API, they cannot be changed. Add WARNING notes to point them out. Suggested-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Andrew Jones <drjones@redhat.com> [maz: turned XXX into WARNING] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200120130825.28838-1-drjones@redhat.com
-
- 21 Oct, 2019 2 commits
-
-
Steven Price authored
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by:
Steven Price <steven.price@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
Christoffer Dall authored
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
- 09 Sep, 2019 1 commit
-
-
Marc Zyngier authored
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by:
Zenghui Yu <yuzenghui@huawei.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Zenghui Yu <yuzenghui@huawei.com> Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
- 05 Jul, 2019 1 commit
-
-
Andre Przywara authored
KVM implements the firmware interface for mitigating cache speculation vulnerabilities. Guests may use this interface to ensure mitigation is active. If we want to migrate such a guest to a host with a different support level for those workarounds, migration might need to fail, to ensure that critical guests don't loose their protection. Introduce a way for userland to save and restore the workarounds state. On restoring we do checks that make sure we don't downgrade our mitigation level. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Steven Price <steven.price@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 13 Jun, 2019 1 commit
-
-
Dave Martin authored
The in-memory representation of SVE and FPSIMD registers is different: the FPSIMD V-registers are stored as single 128-bit host-endian values, whereas SVE registers are stored in an endianness-invariant byte order. This means that the two representations differ when running on a big-endian host. But we blindly copy data from one representation to another when converting between the two, resulting in the register contents being unintentionally byteswapped in certain situations. Currently this can be triggered by the first SVE instruction after a syscall, for example (though the potential trigger points may vary in future). So, fix the conversion functions fpsimd_to_sve(), sve_to_fpsimd() and sve_sync_from_fpsimd_zeropad() to swab where appropriate. There is no common swahl128() or swab128() that we could use here. Maybe it would be worth making this generic, but for now add a simple local hack. Since the byte order differences are exposed in ABI, also clarify the documentation. Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Alan Hayward <alan.hayward@arm.com> Cc: Julien Grall <julien.grall@arm.com> Fixes: bc0ee476 ("arm64/sve: Core task context handling") Fixes: 8cd969d2 ("arm64/sve: Signal handling support") Fixes: 43d4da2c ("arm64/sve: ptrace and ELF coredump support") Signed-off-by:
Dave Martin <Dave.Martin@arm.com> [will: Fix typos in comments and docs spotted by Julien] Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- 24 Apr, 2019 1 commit
-
-
Amit Daniel Kachhap authored
Now that the building blocks of pointer authentication are present, lets add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer authentication for the KVM guest on a per-vcpu basis through the ioctl KVM_ARM_VCPU_INIT. This features will allow the KVM guest to allow the handling of pointer authentication instructions or to treat them as undefined if not set. Necessary documentations are added to reflect the changes done. Reviewed-by:
Dave Martin <Dave.Martin@arm.com> Signed-off-by:
Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 18 Apr, 2019 2 commits
-
-
Dave Martin authored
A complicated DIV_ROUND_UP() expression is currently written out explicitly in multiple places in order to specify the size of the bitmap exchanged with userspace to represent the value of the KVM_REG_ARM64_SVE_VLS pseudo-register. Userspace currently has no direct way to work this out either: for documentation purposes, the size is just quoted as 8 u64s. To make this more intuitive, this patch replaces these with a single define, which is also exported to userspace as KVM_ARM64_SVE_VLS_WORDS. Since the number of words in a bitmap is just the index of the last word used + 1, this patch expresses the bound that way instead. This should make it clearer what is being expressed. For userspace convenience, the minimum and maximum possible vector lengths relevant to the KVM ABI are exposed to UAPI as KVM_ARM64_SVE_VQ_MIN, KVM_ARM64_SVE_VQ_MAX. Since the only direct use for these at present is manipulation of KVM_REG_ARM64_SVE_VLS, no corresponding _VL_ macros are defined. They could be added later if a need arises. Since use of DIV_ROUND_UP() was the only reason for including <linux/kernel.h> in guest.c, this patch also removes that #include. Suggested-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Dave Martin <Dave.Martin@arm.com> Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Dave Martin authored
Currently, the SVE register ID macros are not all defined in the same way, and advertise the fact that FFR maps onto the nonexistent predicate register P16. This is really just for kernel convenience, and may lead userspace into bad habits. Instead, this patch masks the ID macro arguments so that architecturally invalid register numbers will not be passed through any more, and uses a literal KVM_REG_ARM64_SVE_FFR_BASE macro to define KVM_REG_ARM64_SVE_FFR(), similarly to the way the _ZREG() and _PREG() macros are defined. Rather than plugging in magic numbers for the number of Z- and P- registers and the maximum possible number of register slices, this patch provides definitions for those too. Userspace is going to need them in any case, and it makes sense for them to come from <uapi/asm/kvm.h>. sve_reg_to_region() uses convenience constants that are defined in a different way, and also makes use of the fact that the FFR IDs are really contiguous with the P15 IDs, so this patch retains the existing convenience constants in guest.c, supplemented with a couple of sanity checks to check for consistency with the UAPI header. Fixes: e1c9c983 ("KVM: arm64/sve: Add SVE support to register access ioctl interface") Suggested-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Dave Martin <Dave.Martin@arm.com> Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 29 Mar, 2019 2 commits
-
-
Dave Martin authored
This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to allow userspace to set and query the set of vector lengths visible to the guest. In the future, multiple register slices per SVE register may be visible through the ioctl interface. Once the set of slices has been determined we would not be able to allow the vector length set to be changed any more, in order to avoid userspace seeing inconsistent sets of registers. For this reason, this patch adds support for explicit finalization of the SVE configuration via the KVM_ARM_VCPU_FINALIZE ioctl. Finalization is the proper place to allocate the SVE register state storage in vcpu->arch.sve_state, so this patch adds that as appropriate. The data is freed via kvm_arch_vcpu_uninit(), which was previously a no-op on arm64. To simplify the logic for determining what vector lengths can be supported, some code is added to KVM init to work this out, in the kvm_arm_init_arch_resources() hook. The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet. Subsequent patches will allow SVE to be turned on for guest vcpus, making it visible. Signed-off-by:
Dave Martin <Dave.Martin@arm.com> Reviewed-by:
Julien Thierry <julien.thierry@arm.com> Tested-by:
zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Dave Martin authored
This patch adds the following registers for access via the KVM_{GET,SET}_ONE_REG interface: * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices) * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices) * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices) In order to adapt gracefully to future architectural extensions, the registers are logically divided up into slices as noted above: the i parameter denotes the slice index. This allows us to reserve space in the ABI for future expansion of these registers. However, as of today the architecture does not permit registers to be larger than a single slice, so no code is needed in the kernel to expose additional slices, for now. The code can be extended later as needed to expose them up to a maximum of 32 slices (as carved out in the architecture itself) if they really exist someday. The registers are only visible for vcpus that have SVE enabled. They are not enumerated by KVM_GET_REG_LIST on vcpus that do not have SVE. Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not allowed for SVE-enabled vcpus: SVE-aware userspace can use the KVM_REG_ARM64_SVE_ZREG() interface instead to access the same register state. This avoids some complex and pointless emulation in the kernel to convert between the two views of these aliased registers. Signed-off-by:
Dave Martin <Dave.Martin@arm.com> Reviewed-by:
Julien Thierry <julien.thierry@arm.com> Tested-by:
zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 21 Jul, 2018 1 commit
-
-
Dongjiu Geng authored
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by:
Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by:
James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 25 May, 2018 1 commit
-
-
Eric Auger authored
This new attribute allows the userspace to set the base address of a reditributor region, relaxing the constraint of having all consecutive redistibutor frames contiguous. Signed-off-by:
Eric Auger <eric.auger@redhat.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- 20 Apr, 2018 1 commit
-
-
Marc Zyngier authored
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by:
Christoffer Dall <cdall@kernel.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-