Commit 5fc3424f authored by Sean Christopherson's avatar Sean Christopherson Committed by Paolo Bonzini

KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.
Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-19-seanjc@google.com>
Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
parent e7b7bdea
...@@ -44,18 +44,18 @@ following two cases: ...@@ -44,18 +44,18 @@ following two cases:
2. Write-Protection: The SPTE is present and the fault is caused by 2. Write-Protection: The SPTE is present and the fault is caused by
write-protect. That means we just need to change the W bit of the spte. write-protect. That means we just need to change the W bit of the spte.
What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and What we use to avoid all the race is the Host-writable bit and MMU-writable bit
SPTE_MMU_WRITEABLE bit on the spte: on the spte:
- SPTE_HOST_WRITEABLE means the gfn is writable on host. - Host-writable means the gfn is writable in the host kernel page tables and in
- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when its KVM memslot.
the gfn is writable on guest mmu and it is not write-protected by shadow - MMU-writable means the gfn is writable in the guest's mmu and it is not
page write-protection. write-protected by shadow page write-protection.
On fast page fault path, we will use cmpxchg to atomically set the spte W On fast page fault path, we will use cmpxchg to atomically set the spte W
bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, to bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved
restore the saved R/X bits if for an access-traced spte, or both. This is R/X bits if for an access-traced spte, or both. This is safe because whenever
safe because whenever changing these bits can be detected by cmpxchg. changing these bits can be detected by cmpxchg.
But we need carefully check these cases: But we need carefully check these cases:
......
...@@ -129,7 +129,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, ...@@ -129,7 +129,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* write-protects guest page to sync the guest modification, b) another one is * write-protects guest page to sync the guest modification, b) another one is
* used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences * used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences
* between these two sorts are: * between these two sorts are:
* 1) the first case clears SPTE_MMU_WRITEABLE bit. * 1) the first case clears MMU-writable bit.
* 2) the first case requires flushing tlb immediately avoiding corrupting * 2) the first case requires flushing tlb immediately avoiding corrupting
* shadow page table between all vcpus so it should be in the protection of * shadow page table between all vcpus so it should be in the protection of
* mmu-lock. And the another case does not need to flush tlb until returning * mmu-lock. And the another case does not need to flush tlb until returning
...@@ -140,17 +140,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, ...@@ -140,17 +140,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* So, there is the problem: the first case can meet the corrupted tlb caused * So, there is the problem: the first case can meet the corrupted tlb caused
* by another case which write-protects pages but without flush tlb * by another case which write-protects pages but without flush tlb
* immediately. In order to making the first case be aware this problem we let * immediately. In order to making the first case be aware this problem we let
* it flush tlb if we try to write-protect a spte whose SPTE_MMU_WRITEABLE bit * it flush tlb if we try to write-protect a spte whose MMU-writable bit
* is set, it works since another case never touches SPTE_MMU_WRITEABLE bit. * is set, it works since another case never touches MMU-writable bit.
* *
* Anyway, whenever a spte is updated (only permission and status bits are * Anyway, whenever a spte is updated (only permission and status bits are
* changed) we need to check whether the spte with SPTE_MMU_WRITEABLE becomes * changed) we need to check whether the spte with MMU-writable becomes
* readonly, if that happens, we need to flush tlb. Fortunately, * readonly, if that happens, we need to flush tlb. Fortunately,
* mmu_spte_update() has already handled it perfectly. * mmu_spte_update() has already handled it perfectly.
* *
* The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK: * The rules to use MMU-writable and PT_WRITABLE_MASK:
* - if we want to see if it has writable tlb entry or if the spte can be * - if we want to see if it has writable tlb entry or if the spte can be
* writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is the most * writable on the mmu mapping, check MMU-writable, this is the most
* case, otherwise * case, otherwise
* - if we fix page fault on the spte or do write-protection by dirty logging, * - if we fix page fault on the spte or do write-protection by dirty logging,
* check PT_WRITABLE_MASK. * check PT_WRITABLE_MASK.
......
...@@ -1107,7 +1107,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) ...@@ -1107,7 +1107,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect)
rmap_printk("spte %p %llx\n", sptep, *sptep); rmap_printk("spte %p %llx\n", sptep, *sptep);
if (pt_protect) if (pt_protect)
spte &= ~SPTE_MMU_WRITEABLE; spte &= ~shadow_mmu_writable_mask;
spte = spte & ~PT_WRITABLE_MASK; spte = spte & ~PT_WRITABLE_MASK;
return mmu_spte_update(sptep, spte); return mmu_spte_update(sptep, spte);
...@@ -5529,9 +5529,9 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, ...@@ -5529,9 +5529,9 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
* spte from present to present (changing the spte from present * spte from present to present (changing the spte from present
* to nonpresent will flush all the TLBs immediately), in other * to nonpresent will flush all the TLBs immediately), in other
* words, the only case we care is mmu_spte_update() where we * words, the only case we care is mmu_spte_update() where we
* have checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE * have checked Host-writable | MMU-writable instead of
* instead of PT_WRITABLE_MASK, that means it does not depend * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
* on PT_WRITABLE_MASK anymore. * anymore.
*/ */
if (flush) if (flush)
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot); kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
......
...@@ -1085,7 +1085,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) ...@@ -1085,7 +1085,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
nr_present++; nr_present++;
host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE; host_writable = sp->spt[i] & shadow_host_writable_mask;
set_spte_ret |= set_spte(vcpu, &sp->spt[i], set_spte_ret |= set_spte(vcpu, &sp->spt[i],
pte_access, PG_LEVEL_4K, pte_access, PG_LEVEL_4K,
......
...@@ -21,6 +21,8 @@ ...@@ -21,6 +21,8 @@
static bool __read_mostly enable_mmio_caching = true; static bool __read_mostly enable_mmio_caching = true;
module_param_named(mmio_caching, enable_mmio_caching, bool, 0444); module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
u64 __read_mostly shadow_host_writable_mask;
u64 __read_mostly shadow_mmu_writable_mask;
u64 __read_mostly shadow_nx_mask; u64 __read_mostly shadow_nx_mask;
u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
u64 __read_mostly shadow_user_mask; u64 __read_mostly shadow_user_mask;
...@@ -137,7 +139,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, ...@@ -137,7 +139,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
kvm_is_mmio_pfn(pfn)); kvm_is_mmio_pfn(pfn));
if (host_writable) if (host_writable)
spte |= SPTE_HOST_WRITEABLE; spte |= shadow_host_writable_mask;
else else
pte_access &= ~ACC_WRITE_MASK; pte_access &= ~ACC_WRITE_MASK;
...@@ -147,7 +149,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, ...@@ -147,7 +149,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
spte |= (u64)pfn << PAGE_SHIFT; spte |= (u64)pfn << PAGE_SHIFT;
if (pte_access & ACC_WRITE_MASK) { if (pte_access & ACC_WRITE_MASK) {
spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
/* /*
* Optimization: for pte sync, if spte was writable the hash * Optimization: for pte sync, if spte was writable the hash
...@@ -163,7 +165,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, ...@@ -163,7 +165,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
__func__, gfn); __func__, gfn);
ret |= SET_SPTE_WRITE_PROTECTED_PT; ret |= SET_SPTE_WRITE_PROTECTED_PT;
pte_access &= ~ACC_WRITE_MASK; pte_access &= ~ACC_WRITE_MASK;
spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
} }
} }
...@@ -202,7 +204,7 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn) ...@@ -202,7 +204,7 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
new_spte |= (u64)new_pfn << PAGE_SHIFT; new_spte |= (u64)new_pfn << PAGE_SHIFT;
new_spte &= ~PT_WRITABLE_MASK; new_spte &= ~PT_WRITABLE_MASK;
new_spte &= ~SPTE_HOST_WRITEABLE; new_spte &= ~shadow_host_writable_mask;
new_spte = mark_spte_for_access_track(new_spte); new_spte = mark_spte_for_access_track(new_spte);
...@@ -342,6 +344,9 @@ void kvm_mmu_reset_all_pte_masks(void) ...@@ -342,6 +344,9 @@ void kvm_mmu_reset_all_pte_masks(void)
shadow_acc_track_mask = 0; shadow_acc_track_mask = 0;
shadow_me_mask = sme_me_mask; shadow_me_mask = sme_me_mask;
shadow_host_writable_mask = DEFAULT_SPTE_HOST_WRITEABLE;
shadow_mmu_writable_mask = DEFAULT_SPTE_MMU_WRITEABLE;
/* /*
* Set a reserved PA bit in MMIO SPTEs to generate page faults with * Set a reserved PA bit in MMIO SPTEs to generate page faults with
* PFEC.RSVD=1 on MMIO accesses. 64-bit PTEs (PAE, x86-64, and EPT * PFEC.RSVD=1 on MMIO accesses. 64-bit PTEs (PAE, x86-64, and EPT
......
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
#include "mmu_internal.h" #include "mmu_internal.h"
#define PT_FIRST_AVAIL_BITS_SHIFT 10
/* /*
* TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
* be restricted to using write-protection (for L2 when CPU dirty logging, i.e. * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
...@@ -59,9 +57,8 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0); ...@@ -59,9 +57,8 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(10)
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) #define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(11)
#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
/* /*
* Due to limited space in PTEs, the MMIO generation is a 20 bit subset of * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
...@@ -100,6 +97,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11); ...@@ -100,6 +97,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
#define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0) #define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0)
extern u64 __read_mostly shadow_host_writable_mask;
extern u64 __read_mostly shadow_mmu_writable_mask;
extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_nx_mask;
extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
extern u64 __read_mostly shadow_user_mask; extern u64 __read_mostly shadow_user_mask;
...@@ -264,8 +263,8 @@ static inline bool is_dirty_spte(u64 spte) ...@@ -264,8 +263,8 @@ static inline bool is_dirty_spte(u64 spte)
static inline bool spte_can_locklessly_be_made_writable(u64 spte) static inline bool spte_can_locklessly_be_made_writable(u64 spte)
{ {
return (spte & (SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE)) == return (spte & shadow_host_writable_mask) &&
(SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE); (spte & shadow_mmu_writable_mask);
} }
static inline u64 get_mmio_spte_generation(u64 spte) static inline u64 get_mmio_spte_generation(u64 spte)
......
...@@ -1335,7 +1335,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, ...@@ -1335,7 +1335,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
/* /*
* Removes write access on the last level SPTE mapping this GFN and unsets the * Removes write access on the last level SPTE mapping this GFN and unsets the
* SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. * MMU-writable bit to ensure future writes continue to be intercepted.
* Returns true if an SPTE was set and a TLB flush is needed. * Returns true if an SPTE was set and a TLB flush is needed.
*/ */
static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
...@@ -1352,7 +1352,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, ...@@ -1352,7 +1352,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
break; break;
new_spte = iter.old_spte & new_spte = iter.old_spte &
~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
tdp_mmu_set_spte(kvm, &iter, new_spte); tdp_mmu_set_spte(kvm, &iter, new_spte);
spte_set = true; spte_set = true;
...@@ -1365,7 +1365,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, ...@@ -1365,7 +1365,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
/* /*
* Removes write access on the last level SPTE mapping this GFN and unsets the * Removes write access on the last level SPTE mapping this GFN and unsets the
* SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. * MMU-writable bit to ensure future writes continue to be intercepted.
* Returns true if an SPTE was set and a TLB flush is needed. * Returns true if an SPTE was set and a TLB flush is needed.
*/ */
bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment