- 16 Dec, 2021 18 commits
-
-
Marc Zyngier authored
* kvm-arm64/pkvm-hyp-sharing: : . : Series from Quentin Perret, implementing HYP page share/unshare: : : This series implements an unshare hypercall at EL2 in nVHE : protected mode, and makes use of it to unmmap guest-specific : data-structures from EL2 stage-1 during guest tear-down. : Crucially, the implementation of the share and unshare : routines use page refcounts in the host kernel to avoid : accidentally unmapping data-structures that overlap a common : page. : [...] : . KVM: arm64: pkvm: Unshare guest structs during teardown KVM: arm64: Expose unshare hypercall to the host KVM: arm64: Implement do_unshare() helper for unsharing memory KVM: arm64: Implement __pkvm_host_share_hyp() using do_share() KVM: arm64: Implement do_share() helper for sharing memory KVM: arm64: Introduce wrappers for host and hyp spin lock accessors KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages KVM: arm64: pkvm: Refcount the pages shared with EL2 KVM: arm64: Introduce kvm_share_hyp() KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2 KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table KVM: arm64: Fixup hyp stage-1 refcount KVM: arm64: Refcount hyp stage-1 pgtable pages KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Quentin Perret authored
Make use of the newly introduced unshare hypercall during guest teardown to unmap guest-related data structures from the hyp stage-1. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com
-
Will Deacon authored
Introduce an unshare hypercall which can be used to unmap memory from the hypervisor stage-1 in nVHE protected mode. This will be useful to update the EL2 ownership state of pages during guest teardown, and avoids keeping dangling mappings to unreferenced portions of memory. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com
-
Will Deacon authored
Tearing down a previously shared memory region results in the borrower losing access to the underlying pages and returning them to the "owned" state in the owner. Implement a do_unshare() helper, along the same lines as do_share(), to provide this functionality for the host-to-hyp case. Reviewed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-13-qperret@google.com
-
Will Deacon authored
__pkvm_host_share_hyp() shares memory between the host and the hypervisor so implement it as an invocation of the new do_share() mechanism. Note that double-sharing is no longer permitted (as this allows us to reduce the number of page-table walks significantly), but is thankfully no longer relied upon by the host. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-12-qperret@google.com
-
Will Deacon authored
By default, protected KVM isolates memory pages so that they are accessible only to their owner: be it the host kernel, the hypervisor at EL2 or (in future) the guest. Establishing shared-memory regions between these components therefore involves a transition for each page so that the owner can share memory with a borrower under a certain set of permissions. Introduce a do_share() helper for safely sharing a memory region between two components. Currently, only host-to-hyp sharing is implemented, but the code is easily extended to handle other combinations and the permission checks for each component are reusable. Reviewed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-11-qperret@google.com
-
Will Deacon authored
In preparation for adding additional locked sections for manipulating page-tables at EL2, introduce some simple wrappers around the host and hypervisor locks so that it's a bit easier to read and bit more difficult to take the wrong lock (or even take them in the wrong order). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-10-qperret@google.com
-
Will Deacon authored
Explicitly name the combination of SW0 | SW1 as reserved in the pte and introduce a new PKVM_NOPAGE meta-state which, although not directly stored in the software bits of the pte, can be used to represent an entry for which there is no underlying page. This is distinct from an invalid pte, as stage-2 identity mappings for the host are created lazily and so an invalid pte there is the same as a valid mapping for the purposes of ownership information. This state will be used for permission checking during page transitions in later patches. Reviewed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-9-qperret@google.com
-
Quentin Perret authored
In order to simplify the page tracking infrastructure at EL2 in nVHE protected mode, move the responsibility of refcounting pages that are shared multiple times on the host. In order to do so, let's create a red-black tree tracking all the PFNs that have been shared, along with a refcount. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com
-
Quentin Perret authored
The create_hyp_mappings() function can currently be called at any point in time. However, its behaviour in protected mode changes widely depending on when it is being called. Prior to KVM init, it is used to create the temporary page-table used to bring-up the hypervisor, and later on it is transparently turned into a 'share' hypercall when the kernel has lost control over the hypervisor stage-1. In order to prepare the ground for also unsharing pages with the hypervisor during guest teardown, introduce a kvm_share_hyp() function to make it clear in which places a share hypercall should be expected, as we will soon need a matching unshare hypercall in all those places. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
-
Will Deacon authored
Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor stage-1 mappings at EL2. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-6-qperret@google.com
-
Will Deacon authored
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback being provided by the memory-management operations for the page-table. Wire up this callback for the hypervisor stage-1 page-table. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com
-
Quentin Perret authored
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken due to the lack of refcount support in the early allocator. Fix-up the refcount in the finalize walker, once the 'hyp_vmemmap' is up and running. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com
-
Quentin Perret authored
To prepare the ground for allowing hyp stage-1 mappings to be removed at run-time, update the KVM page-table code to maintain a correct refcount using the ->{get,put}_page() function callbacks. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com
-
Quentin Perret authored
In nVHE protected mode, the EL2 code uses a temporary allocator during boot while re-creating its stage-1 page-table. Unfortunately, the hyp_vmmemap is not ready to use at this stage, so refcounting pages is not possible. That is not currently a problem because hyp stage-1 mappings are never removed, which implies refcounting of page-table pages is unnecessary. In preparation for allowing hypervisor stage-1 mappings to be removed, provide stub implementations for {get,put}_page() in the early allocator. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com
-
Marc Zyngier authored
* kvm-arm64/vgic-fixes-5.17: : . : A few vgic fixes: : - Harden vgic-v3 error handling paths against signed vs unsigned : comparison that will happen once the xarray-based vcpus are in : - Demote userspace-triggered console output to kvm_debug() : . KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug() KVM: arm64: vgic-v3: Fix vcpu index comparison Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
Running the KVM selftests results in these messages being dumped in the kernel console: [ 188.051073] kvm [469]: VGIC redist and dist frames overlap [ 188.056820] kvm [469]: VGIC redist and dist frames overlap [ 188.076199] kvm [469]: VGIC redist and dist frames overlap Being amle to trigger this from userspace is definitely not on, so demote these warnings to kvm_debug(). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org
-
Marc Zyngier authored
When handling an error at the point where we try and register all the redistributors, we unregister all the previously registered frames by counting down from the failing index. However, the way the code is written relies on that index being a signed value. Which won't be true once we switch to an xarray-based vcpu set. Since this code is pretty awkward the first place, and that the failure mode is hard to spot, rewrite this loop to iterate over the vcpus upwards rather than downwards. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org
-
- 15 Dec, 2021 7 commits
-
-
Marc Zyngier authored
* kvm-arm64/pkvm-cleanups-5.17: : . : pKVM cleanups from Quentin Perret: : : This series is a collection of various fixes and cleanups for KVM/arm64 : when running in nVHE protected mode. The first two patches are real : fixes/improvements, the following two are minor cleanups, and the last : two help satisfy my paranoia so they're certainly optional. : . KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE KVM: arm64: pkvm: Stub io map functions KVM: arm64: Make __io_map_base static KVM: arm64: Make the hyp memory pool static KVM: arm64: pkvm: Disable GICv2 support KVM: arm64: pkvm: Fix hyp_pool max order Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Quentin Perret authored
The kvm_host_owns_hyp_mappings() function should return true if and only if the host kernel is responsible for creating the hypervisor stage-1 mappings. That is only possible in standard non-VHE mode, or during boot in protected nVHE mode. But either way, none of this makes sense in VHE, so make sure to catch this case as well, hence making the function return sensible values in any context (VHE or not). Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com
-
Quentin Perret authored
Now that GICv2 is disabled in nVHE protected mode there should be no other reason for the host to use create_hyp_io_mappings() or kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption remains true looking forward. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com
-
Quentin Perret authored
The __io_map_base variable is used at EL2 to track the end of the hypervisor's "private" VA range in nVHE protected mode. However it doesn't need to be used outside of mm.c, so let's make it static to keep all the hyp VA allocation logic in one place. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com
-
Quentin Perret authored
The hyp memory pool struct is sized to fit exactly the needs of the hypervisor stage-1 page-table allocator, so it is important it is not used for anything else. As it is currently used only from setup.c, reduce its visibility by marking it static. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com
-
Quentin Perret authored
GICv2 requires having device mappings in guests and the hypervisor, which is incompatible with the current pKVM EL2 page ownership model which only covers memory. While it would be desirable to support pKVM with GICv2, this will require a lot more work, so let's make the current assumption clear until then. Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
-
Quentin Perret authored
The EL2 page allocator in protected mode maintains a per-pool max order value to optimize allocations when the memory region it covers is small. However, the max order value is currently under-estimated whenever the number of pages in the region is a power of two. Fix the estimation. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-2-qperret@google.com
-
- 07 Dec, 2021 3 commits
-
-
Marc Zyngier authored
* kvm-arm64/misc-5.17: : . : - Add minimal support for ARMv8.7's PMU extension : - Constify kvm_io_gic_ops : - Drop kvm_is_transparent_hugepage() prototype : . KVM: Drop stale kvm_is_transparent_hugepage() declaration KVM: arm64: Constify kvm_io_gic_ops KVM: arm64: Add minimal handling for the ARMv8.7 PMU Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
* kvm-arm64/hyp-header-split: : . : Tidy up the header file usage for the nvhe hyp object so : that header files under arch/arm64/kvm/hyp/include are not : included by host code running at EL1. : . KVM: arm64: Move host EL1 code out of hyp/ directory KVM: arm64: Generate hyp_constants.h for the host arm64: Add missing include of asm/cpufeature.h to asm/mmu.h Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Vitaly Kuznetsov authored
kvm_is_transparent_hugepage() was removed in commit 205d76ff ("KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its declaration in include/linux/kvm_host.h persisted. Drop it. Fixes: 205d76ff (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
-
- 06 Dec, 2021 4 commits
-
-
Will Deacon authored
kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not linked into the hypervisor object. Move the file into kvm/pkvm.c and rework the headers so that the definitions shared between the host and the hypervisor live in asm/kvm_pkvm.h. Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org
-
Will Deacon authored
In order to avoid exposing hypervisor (EL2) data structures directly to the host, generate hyp_constants.h to provide constants such as structure sizes to the host without dragging in the definitions themselves. Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-3-will@kernel.org
-
Will Deacon authored
asm/mmu.h refers to cpus_have_const_cap() in the definition of arm64_kernel_unmapped_at_el0() so include asm/cpufeature.h directly rather than force all users of the header to do it themselves. Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-2-will@kernel.org
-
Rikard Falkeborn authored
The only usage of kvm_io_gic_ops is to make a comparison with its address and to pass its address to kvm_iodevice_init() which takes a pointer to const kvm_io_device_ops as input. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
-
- 01 Dec, 2021 8 commits
-
-
Marc Zyngier authored
When running a KVM guest hosted on an ARMv8.7 machine, the host kernel complains that it doesn't know about the architected number of events. Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org
-
Marc Zyngier authored
* kvm-arm64/fpsimd-tracking: : . : Simplify the handling of both the FP/SIMD and SVE state by : removing the need for mapping the thread at EL2, and by : dropping the tracking of the host's SVE state which is : always invalid by construction. : . arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM KVM: arm64: Stop mapping current thread_info at EL2 KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE KVM: arm64: Remove unused __sve_save_state KVM: arm64: Get rid of host SVE tracking/saving KVM: arm64: Reorder vcpu flag definitions Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
* kvm-arm64/vcpu-first-run: : Rework the "vcpu first run" sequence to be driven by KVM's : "PID change" callback, removing the need for extra state. KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid KVM: arm64: Merge kvm_arch_vcpu_run_pid_change() and kvm_vcpu_first_run_init() KVM: arm64: Restructure the point where has_run_once is advertised KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of line KVM: arm64: Move SVE state mapping at HYP to finalize-time Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID change. The kvm_vcpu_first_run_init() helper gets run on the... first run(!) of a vcpu. As it turns out, the first run of a vcpu also triggers a PID change event (vcpu->pid is initially NULL). Use this property to merge these two helpers and get rid of another arm64-specific oddity. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
Restructure kvm_vcpu_first_run_init() to set the has_run_once flag after having completed all the "run once" activities. This includes moving the flip of the userspace irqchip static key to a point where nothing can fail. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything to the table. Move it next to kvm_vcpu_first_run_init(), which will be convenient for what is next to come. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
We currently map the SVE state to HYP on detection of a PID change. Although this matches what we do for FPSIMD, this is pretty pointless for SVE, as the buffer is per-vcpu and has nothing to do with the thread that is being run. Move the mapping of the SVE state to finalize-time, which is where we allocate the state memory, and thus the most logical place to do this. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-