Commit ab978c62 authored by Paolo Bonzini's avatar Paolo Bonzini

Merge branch 'kvm-6.11-sev-snp' into HEAD

Pull base x86 KVM support for running SEV-SNP guests from Michael Roth:

* add some basic infrastructure and introduces a new KVM_X86_SNP_VM
  vm_type to handle differences versus the existing KVM_X86_SEV_VM and
  KVM_X86_SEV_ES_VM types.

* implement the KVM API to handle the creation of a cryptographic
  launch context, encrypt/measure the initial image into guest memory,
  and finalize it before launching it.

* implement handling for various guest-generated events such as page
  state changes, onlining of additional vCPUs, etc.

* implement the gmem/mmu hooks needed to prepare gmem-allocated pages
  before mapping them into guest private memory ranges as well as
  cleaning them up prior to returning them to the host for use as
  normal memory. Because those cleanup hooks supplant certain
  activities like issuing WBINVDs during KVM MMU invalidations, avoid
  duplicating that work to avoid unecessary overhead.

This merge leaves out support support for attestation guest requests
and for loading the signing keys to be used for attestation requests.
parents f9d1b541 b2ec0423
...@@ -466,6 +466,112 @@ issued by the hypervisor to make the guest ready for execution. ...@@ -466,6 +466,112 @@ issued by the hypervisor to make the guest ready for execution.
Returns: 0 on success, -negative on error Returns: 0 on success, -negative on error
18. KVM_SEV_SNP_LAUNCH_START
----------------------------
The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
context for the SEV-SNP guest. It must be called prior to issuing
KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH;
Parameters (in): struct kvm_sev_snp_launch_start
Returns: 0 on success, -negative on error
::
struct kvm_sev_snp_launch_start {
__u64 policy; /* Guest policy to use. */
__u8 gosvw[16]; /* Guest OS visible workarounds. */
__u16 flags; /* Must be zero. */
__u8 pad0[6];
__u64 pad1[4];
};
See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_start``.
19. KVM_SEV_SNP_LAUNCH_UPDATE
-----------------------------
The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
data into a guest GPA range, measuring the contents into the SNP guest context
created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
range so that it will be immediately readable using the encryption key
associated with the guest context once it is booted, after which point it can
attest the measurement associated with its context before unlocking any
secrets.
It is required that the GPA ranges initialized by this command have had the
KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
Upon success, this command is not guaranteed to have processed the entire
range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
``struct kvm_sev_snp_launch_update`` will be updated to correspond to the
remaining range that has yet to be processed. The caller should continue
calling this command until those fields indicate the entire range has been
processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
``uaddr`` will be ignored completely.
Parameters (in): struct kvm_sev_snp_launch_update
Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
::
struct kvm_sev_snp_launch_update {
__u64 gfn_start; /* Guest page number to load/encrypt data into. */
__u64 uaddr; /* Userspace address of data to be loaded/encrypted. */
__u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/
__u8 type; /* The type of the guest pages being initialized. */
__u8 pad0;
__u16 flags; /* Must be zero. */
__u32 pad1;
__u64 pad2[4];
};
where the allowed values for page_type are #define'd as::
KVM_SEV_SNP_PAGE_TYPE_NORMAL
KVM_SEV_SNP_PAGE_TYPE_ZERO
KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
KVM_SEV_SNP_PAGE_TYPE_SECRETS
KVM_SEV_SNP_PAGE_TYPE_CPUID
See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
used/measured.
20. KVM_SEV_SNP_LAUNCH_FINISH
-----------------------------
After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
command can be issued to make the guest ready for execution.
Parameters (in): struct kvm_sev_snp_launch_finish
Returns: 0 on success, -negative on error
::
struct kvm_sev_snp_launch_finish {
__u64 id_block_uaddr;
__u64 id_auth_uaddr;
__u8 id_block_en;
__u8 auth_key_en;
__u8 vcek_disabled;
__u8 host_data[32];
__u8 pad0[3];
__u16 flags; /* Must be zero */
__u64 pad1[4];
};
See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
Device attribute API Device attribute API
==================== ====================
...@@ -497,9 +603,11 @@ References ...@@ -497,9 +603,11 @@ References
========== ==========
See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info. See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
for more info.
.. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf .. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
.. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf .. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
.. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34) .. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
.. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf .. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
...@@ -139,6 +139,9 @@ KVM_X86_OP(vcpu_deliver_sipi_vector) ...@@ -139,6 +139,9 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL(get_untagged_addr) KVM_X86_OP_OPTIONAL(get_untagged_addr)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
#undef KVM_X86_OP #undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL #undef KVM_X86_OP_OPTIONAL
......
...@@ -121,6 +121,7 @@ ...@@ -121,6 +121,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \ #define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
#define CR0_RESERVED_BITS \ #define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
...@@ -1812,6 +1813,9 @@ struct kvm_x86_ops { ...@@ -1812,6 +1813,9 @@ struct kvm_x86_ops {
gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags); gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
}; };
struct kvm_x86_nested_ops { struct kvm_x86_nested_ops {
...@@ -1939,6 +1943,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, ...@@ -1939,6 +1943,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot); const struct kvm_memory_slot *memslot);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
......
...@@ -59,6 +59,14 @@ ...@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0) #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
/* Preferred GHCB GPA Request */
#define GHCB_MSR_PREF_GPA_REQ 0x010
#define GHCB_MSR_GPA_VALUE_POS 12
#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_PREF_GPA_RESP 0x011
#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
/* GHCB GPA Register */ /* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012 #define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \ #define GHCB_MSR_REG_GPA_REQ_VAL(v) \
...@@ -93,11 +101,17 @@ enum psc_op { ...@@ -93,11 +101,17 @@ enum psc_op {
/* GHCBData[11:0] */ \ /* GHCBData[11:0] */ \
GHCB_MSR_PSC_REQ) GHCB_MSR_PSC_REQ)
#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
#define GHCB_MSR_PSC_RESP 0x015 #define GHCB_MSR_PSC_RESP 0x015
#define GHCB_MSR_PSC_RESP_VAL(val) \ #define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \ /* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32) (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
/* Set highest bit as a generic error response */
#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
/* GHCB Hypervisor Feature Request/Response */ /* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080 #define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081 #define GHCB_MSR_HV_FT_RESP 0x081
...@@ -115,8 +129,19 @@ enum psc_op { ...@@ -115,8 +129,19 @@ enum psc_op {
* The VMGEXIT_PSC_MAX_ENTRY determines the size of the PSC structure, which * The VMGEXIT_PSC_MAX_ENTRY determines the size of the PSC structure, which
* is a local stack variable in set_pages_state(). Do not increase this value * is a local stack variable in set_pages_state(). Do not increase this value
* without evaluating the impact to stack usage. * without evaluating the impact to stack usage.
*
* Use VMGEXIT_PSC_MAX_COUNT in cases where the actual GHCB-defined max value
* is needed, such as when processing GHCB requests on the hypervisor side.
*/ */
#define VMGEXIT_PSC_MAX_ENTRY 64 #define VMGEXIT_PSC_MAX_ENTRY 64
#define VMGEXIT_PSC_MAX_COUNT 253
#define VMGEXIT_PSC_ERROR_GENERIC (0x100UL << 32)
#define VMGEXIT_PSC_ERROR_INVALID_HDR ((1UL << 32) | 1)
#define VMGEXIT_PSC_ERROR_INVALID_ENTRY ((1UL << 32) | 2)
#define VMGEXIT_PSC_OP_PRIVATE 1
#define VMGEXIT_PSC_OP_SHARED 2
struct psc_hdr { struct psc_hdr {
u16 cur_entry; u16 cur_entry;
......
...@@ -91,6 +91,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs); ...@@ -91,6 +91,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMUPDATE detected 4K page and 2MB page overlap. */ /* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 4 #define RMPUPDATE_FAIL_OVERLAP 4
/* PSMASH failed due to concurrent access by another CPU */
#define PSMASH_FAIL_INUSE 3
/* RMP page size */ /* RMP page size */
#define RMP_PG_SIZE_4K 0 #define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1 #define RMP_PG_SIZE_2M 1
......
...@@ -285,7 +285,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_ ...@@ -285,7 +285,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF) #define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5) #define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
#define SVM_SEV_FEAT_INT_INJ_MODES \
(SVM_SEV_FEAT_RESTRICTED_INJECTION | \
SVM_SEV_FEAT_ALTERNATE_INJECTION)
struct vmcb_seg { struct vmcb_seg {
u16 selector; u16 selector;
......
...@@ -697,6 +697,11 @@ enum sev_cmd_id { ...@@ -697,6 +697,11 @@ enum sev_cmd_id {
/* Second time is the charm; improved versions of the above ioctls. */ /* Second time is the charm; improved versions of the above ioctls. */
KVM_SEV_INIT2, KVM_SEV_INIT2,
/* SNP-specific commands */
KVM_SEV_SNP_LAUNCH_START = 100,
KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_SNP_LAUNCH_FINISH,
KVM_SEV_NR_MAX, KVM_SEV_NR_MAX,
}; };
...@@ -824,6 +829,48 @@ struct kvm_sev_receive_update_data { ...@@ -824,6 +829,48 @@ struct kvm_sev_receive_update_data {
__u32 pad2; __u32 pad2;
}; };
struct kvm_sev_snp_launch_start {
__u64 policy;
__u8 gosvw[16];
__u16 flags;
__u8 pad0[6];
__u64 pad1[4];
};
/* Kept in sync with firmware values for simplicity. */
#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
struct kvm_sev_snp_launch_update {
__u64 gfn_start;
__u64 uaddr;
__u64 len;
__u8 type;
__u8 pad0;
__u16 flags;
__u32 pad1;
__u64 pad2[4];
};
#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
struct kvm_sev_snp_launch_finish {
__u64 id_block_uaddr;
__u64 id_auth_uaddr;
__u8 id_block_en;
__u8 auth_key_en;
__u8 vcek_disabled;
__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
__u8 pad0[3];
__u16 flags;
__u64 pad1[4];
};
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
...@@ -874,5 +921,6 @@ struct kvm_hyperv_eventfd { ...@@ -874,5 +921,6 @@ struct kvm_hyperv_eventfd {
#define KVM_X86_SW_PROTECTED_VM 1 #define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2 #define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3 #define KVM_X86_SEV_ES_VM 3
#define KVM_X86_SNP_VM 4
#endif /* _ASM_X86_KVM_H */ #endif /* _ASM_X86_KVM_H */
...@@ -139,6 +139,9 @@ config KVM_AMD_SEV ...@@ -139,6 +139,9 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64 depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m) depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM
select HAVE_KVM_GMEM_PREPARE
select HAVE_KVM_GMEM_INVALIDATE
help help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors. with Encrypted State (SEV-ES) on AMD processors.
......
...@@ -253,8 +253,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm) ...@@ -253,8 +253,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm)); return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
} }
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
int kvm_mmu_post_init_vm(struct kvm *kvm); int kvm_mmu_post_init_vm(struct kvm *kvm);
......
...@@ -3308,7 +3308,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, ...@@ -3308,7 +3308,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
return RET_PF_CONTINUE; return RET_PF_CONTINUE;
} }
static bool page_fault_can_be_fast(struct kvm_page_fault *fault) static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault)
{ {
/* /*
* Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
...@@ -3319,6 +3319,26 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) ...@@ -3319,6 +3319,26 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
if (fault->rsvd) if (fault->rsvd)
return false; return false;
/*
* For hardware-protected VMs, certain conditions like attempting to
* perform a write to a page which is not in the state that the guest
* expects it to be in can result in a nested/extended #PF. In this
* case, the below code might misconstrue this situation as being the
* result of a write-protected access, and treat it as a spurious case
* rather than taking any action to satisfy the real source of the #PF
* such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the
* guest spinning on a #PF indefinitely, so don't attempt the fast path
* in this case.
*
* Note that the kvm_mem_is_private() check might race with an
* attribute update, but this will either result in the guest spinning
* on RET_PF_SPURIOUS until the update completes, or an actual spurious
* case might go down the slow path. Either case will resolve itself.
*/
if (kvm->arch.has_private_mem &&
fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
return false;
/* /*
* #PF can be fast if: * #PF can be fast if:
* *
...@@ -3419,7 +3439,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) ...@@ -3419,7 +3439,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
u64 *sptep; u64 *sptep;
uint retry_count = 0; uint retry_count = 0;
if (!page_fault_can_be_fast(fault)) if (!page_fault_can_be_fast(vcpu->kvm, fault))
return ret; return ret;
walk_shadow_page_lockless_begin(vcpu); walk_shadow_page_lockless_begin(vcpu);
...@@ -4291,6 +4311,25 @@ static inline u8 kvm_max_level_for_order(int order) ...@@ -4291,6 +4311,25 @@ static inline u8 kvm_max_level_for_order(int order)
return PG_LEVEL_4K; return PG_LEVEL_4K;
} }
static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
u8 max_level, int gmem_order)
{
u8 req_max_level;
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
max_level = min(kvm_max_level_for_order(gmem_order), max_level);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
req_max_level = static_call(kvm_x86_private_max_mapping_level)(kvm, pfn);
if (req_max_level)
max_level = min(max_level, req_max_level);
return req_max_level;
}
static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault) struct kvm_page_fault *fault)
{ {
...@@ -4308,9 +4347,9 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, ...@@ -4308,9 +4347,9 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return r; return r;
} }
fault->max_level = min(kvm_max_level_for_order(max_order),
fault->max_level);
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
fault->max_level, max_order);
return RET_PF_CONTINUE; return RET_PF_CONTINUE;
} }
...@@ -6790,6 +6829,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, ...@@ -6790,6 +6829,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
return need_tlb_flush; return need_tlb_flush;
} }
EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm, static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot) const struct kvm_memory_slot *slot)
......
This diff is collapsed.
...@@ -1404,6 +1404,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ...@@ -1404,6 +1404,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0; svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0; svm->virt_spec_ctrl = 0;
if (init_event)
sev_snp_init_protected_guest_state(vcpu);
init_vmcb(vcpu); init_vmcb(vcpu);
if (!init_event) if (!init_event)
...@@ -2050,6 +2053,7 @@ static int pf_interception(struct kvm_vcpu *vcpu) ...@@ -2050,6 +2053,7 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu) static int npf_interception(struct kvm_vcpu *vcpu)
{ {
struct vcpu_svm *svm = to_svm(vcpu); struct vcpu_svm *svm = to_svm(vcpu);
int rc;
u64 fault_address = svm->vmcb->control.exit_info_2; u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1; u64 error_code = svm->vmcb->control.exit_info_1;
...@@ -2063,11 +2067,19 @@ static int npf_interception(struct kvm_vcpu *vcpu) ...@@ -2063,11 +2067,19 @@ static int npf_interception(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK)) if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
error_code &= ~PFERR_SYNTHETIC_MASK; error_code &= ~PFERR_SYNTHETIC_MASK;
if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
error_code |= PFERR_PRIVATE_ACCESS;
trace_kvm_page_fault(vcpu, fault_address, error_code); trace_kvm_page_fault(vcpu, fault_address, error_code);
return kvm_mmu_page_fault(vcpu, fault_address, error_code, rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
static_cpu_has(X86_FEATURE_DECODEASSISTS) ? static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
svm->vmcb->control.insn_bytes : NULL, svm->vmcb->control.insn_bytes : NULL,
svm->vmcb->control.insn_len); svm->vmcb->control.insn_len);
if (rc > 0 && error_code & PFERR_GUEST_RMP_MASK)
sev_handle_rmp_fault(vcpu, fault_address, error_code);
return rc;
} }
static int db_interception(struct kvm_vcpu *vcpu) static int db_interception(struct kvm_vcpu *vcpu)
...@@ -4937,8 +4949,11 @@ static int svm_vm_init(struct kvm *kvm) ...@@ -4937,8 +4949,11 @@ static int svm_vm_init(struct kvm *kvm)
if (type != KVM_X86_DEFAULT_VM && if (type != KVM_X86_DEFAULT_VM &&
type != KVM_X86_SW_PROTECTED_VM) { type != KVM_X86_SW_PROTECTED_VM) {
kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM); kvm->arch.has_protected_state =
(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
to_kvm_sev_info(kvm)->need_init = true; to_kvm_sev_info(kvm)->need_init = true;
kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
} }
if (!pause_filter_count || !pause_filter_thresh) if (!pause_filter_count || !pause_filter_thresh)
...@@ -5095,6 +5110,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { ...@@ -5095,6 +5110,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons, .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page, .alloc_apic_backing_page = svm_alloc_apic_backing_page,
.gmem_prepare = sev_gmem_prepare,
.gmem_invalidate = sev_gmem_invalidate,
.private_max_mapping_level = sev_private_max_mapping_level,
}; };
/* /*
......
...@@ -94,6 +94,7 @@ struct kvm_sev_info { ...@@ -94,6 +94,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */ struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */ struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress; atomic_t migration_in_progress;
void *snp_context; /* SNP guest context page */
}; };
struct kvm_svm { struct kvm_svm {
...@@ -209,6 +210,18 @@ struct vcpu_sev_es_state { ...@@ -209,6 +210,18 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len; u32 ghcb_sa_len;
bool ghcb_sa_sync; bool ghcb_sa_sync;
bool ghcb_sa_free; bool ghcb_sa_free;
/* SNP Page-State-Change buffer entries currently being processed */
u16 psc_idx;
u16 psc_inflight;
bool psc_2m;
u64 ghcb_registered_gpa;
struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
gpa_t snp_vmsa_gpa;
bool snp_ap_waiting_for_reset;
bool snp_has_guest_vmsa;
}; };
struct vcpu_svm { struct vcpu_svm {
...@@ -350,6 +363,23 @@ static __always_inline bool sev_es_guest(struct kvm *kvm) ...@@ -350,6 +363,23 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif #endif
} }
static __always_inline bool sev_snp_guest(struct kvm *kvm)
{
#ifdef CONFIG_KVM_AMD_SEV
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
!WARN_ON_ONCE(!sev_es_guest(kvm));
#else
return false;
#endif
}
static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
{
return svm->sev_es.ghcb_registered_gpa == val;
}
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb) static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{ {
vmcb->control.clean = 0; vmcb->control.clean = 0;
...@@ -705,6 +735,11 @@ void sev_hardware_unsetup(void); ...@@ -705,6 +735,11 @@ void sev_hardware_unsetup(void);
int sev_cpu_init(struct svm_cpu_data *sd); int sev_cpu_init(struct svm_cpu_data *sd);
int sev_dev_get_attr(u32 group, u64 attr, u64 *val); int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
extern unsigned int max_sev_asid; extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
#else #else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) { static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
...@@ -718,6 +753,18 @@ static inline void sev_hardware_unsetup(void) {} ...@@ -718,6 +753,18 @@ static inline void sev_hardware_unsetup(void) {}
static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; } static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
#define max_sev_asid 0 #define max_sev_asid 0
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
{
return 0;
}
static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
{
return 0;
}
#endif #endif
/* vmenter.S */ /* vmenter.S */
......
...@@ -1834,6 +1834,37 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit, ...@@ -1834,6 +1834,37 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
__entry->vcpu_id, __entry->ghcb_gpa, __entry->result) __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
); );
/*
* Tracepoint for #NPFs due to RMP faults.
*/
TRACE_EVENT(kvm_rmp_fault,
TP_PROTO(struct kvm_vcpu *vcpu, u64 gpa, u64 pfn, u64 error_code,
int rmp_level, int psmash_ret),
TP_ARGS(vcpu, gpa, pfn, error_code, rmp_level, psmash_ret),
TP_STRUCT__entry(
__field(unsigned int, vcpu_id)
__field(u64, gpa)
__field(u64, pfn)
__field(u64, error_code)
__field(int, rmp_level)
__field(int, psmash_ret)
),
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
__entry->gpa = gpa;
__entry->pfn = pfn;
__entry->error_code = error_code;
__entry->rmp_level = rmp_level;
__entry->psmash_ret = psmash_ret;
),
TP_printk("vcpu %u gpa %016llx pfn 0x%llx error_code 0x%llx rmp_level %d psmash_ret %d",
__entry->vcpu_id, __entry->gpa, __entry->pfn,
__entry->error_code, __entry->rmp_level, __entry->psmash_ret)
);
#endif /* _TRACE_KVM_H */ #endif /* _TRACE_KVM_H */
#undef TRACE_INCLUDE_PATH #undef TRACE_INCLUDE_PATH
......
...@@ -10930,6 +10930,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ...@@ -10930,6 +10930,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
kvm_vcpu_reset(vcpu, true);
if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
r = 1;
goto out;
}
}
} }
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win || if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
...@@ -13137,6 +13145,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu) ...@@ -13137,6 +13145,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_PMI, vcpu)) if (kvm_test_request(KVM_REQ_PMI, vcpu))
return true; return true;
if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
return true;
if (kvm_arch_interrupt_allowed(vcpu) && if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) || (kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu))) kvm_guest_apic_has_interrupt(vcpu)))
...@@ -13590,6 +13601,24 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) ...@@ -13590,6 +13601,24 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
} }
EXPORT_SYMBOL_GPL(kvm_arch_no_poll); EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
{
return kvm->arch.vm_type == KVM_X86_SNP_VM;
}
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
{
return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
}
#endif
#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
{
static_call_cond(kvm_x86_gmem_invalidate)(start, end);
}
#endif
int kvm_spec_ctrl_test_value(u64 value) int kvm_spec_ctrl_test_value(u64 value)
{ {
...@@ -13975,6 +14004,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter); ...@@ -13975,6 +14004,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);
static int __init kvm_x86_init(void) static int __init kvm_x86_init(void)
{ {
......
...@@ -2441,4 +2441,40 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, ...@@ -2441,4 +2441,40 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
} }
#endif /* CONFIG_KVM_PRIVATE_MEM */ #endif /* CONFIG_KVM_PRIVATE_MEM */
#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
#endif
/**
* kvm_gmem_populate() - Populate/prepare a GPA range with guest data
*
* @kvm: KVM instance
* @gfn: starting GFN to be populated
* @src: userspace-provided buffer containing data to copy into GFN range
* (passed to @post_populate, and incremented on each iteration
* if not NULL)
* @npages: number of pages to copy from userspace-buffer
* @post_populate: callback to issue for each gmem page that backs the GPA
* range
* @opaque: opaque data to pass to @post_populate callback
*
* This is primarily intended for cases where a gmem-backed GPA range needs
* to be initialized with userspace-provided data prior to being mapped into
* the guest as a private page. This should be called with the slots->lock
* held so that caller-enforced invariants regarding the expected memory
* attributes of the GPA range do not race with KVM_SET_MEMORY_ATTRIBUTES.
*
* Returns the number of pages that were populated.
*/
typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
void __user *src, int order, void *opaque);
long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
kvm_gmem_populate_cb post_populate, void *opaque);
#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
#endif
#endif #endif
...@@ -209,6 +209,7 @@ enum mapping_flags { ...@@ -209,6 +209,7 @@ enum mapping_flags {
AS_STABLE_WRITES, /* must wait for writeback before modifying AS_STABLE_WRITES, /* must wait for writeback before modifying
folio contents */ folio contents */
AS_UNMOVABLE, /* The mapping cannot be moved, ever */ AS_UNMOVABLE, /* The mapping cannot be moved, ever */
AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping */
}; };
/** /**
......
...@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update { ...@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update {
* @id_auth_paddr: system physical address of ID block authentication structure * @id_auth_paddr: system physical address of ID block authentication structure
* @id_block_en: indicates whether ID block is present * @id_block_en: indicates whether ID block is present
* @auth_key_en: indicates whether author key is present in authentication structure * @auth_key_en: indicates whether author key is present in authentication structure
* @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports
* @rsvd: reserved * @rsvd: reserved
* @host_data: host-supplied data for guest, not interpreted by firmware * @host_data: host-supplied data for guest, not interpreted by firmware
*/ */
...@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish { ...@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish {
u64 id_auth_paddr; u64 id_auth_paddr;
u8 id_block_en:1; u8 id_block_en:1;
u8 auth_key_en:1; u8 auth_key_en:1;
u64 rsvd:62; u8 vcek_disabled:1;
u64 rsvd:61;
u8 host_data[32]; u8 host_data[32];
} __packed; } __packed;
......
...@@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) ...@@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
* doing a complex calculation here, and then doing the zeroing * doing a complex calculation here, and then doing the zeroing
* anyway if the page split fails. * anyway if the page split fails.
*/ */
folio_zero_range(folio, offset, length); if (!(folio->mapping->flags & AS_INACCESSIBLE))
folio_zero_range(folio, offset, length);
if (folio_has_private(folio)) if (folio_has_private(folio))
folio_invalidate(folio, offset, length); folio_invalidate(folio, offset, length);
......
...@@ -109,3 +109,11 @@ config KVM_GENERIC_PRIVATE_MEM ...@@ -109,3 +109,11 @@ config KVM_GENERIC_PRIVATE_MEM
select KVM_GENERIC_MEMORY_ATTRIBUTES select KVM_GENERIC_MEMORY_ATTRIBUTES
select KVM_PRIVATE_MEM select KVM_PRIVATE_MEM
bool bool
config HAVE_KVM_GMEM_PREPARE
bool
depends on KVM_PRIVATE_MEM
config HAVE_KVM_GMEM_INVALIDATE
bool
depends on KVM_PRIVATE_MEM
...@@ -13,14 +13,50 @@ struct kvm_gmem { ...@@ -13,14 +13,50 @@ struct kvm_gmem {
struct list_head entry; struct list_head entry;
}; };
static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
{
#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
struct list_head *gmem_list = &inode->i_mapping->i_private_list;
struct kvm_gmem *gmem;
list_for_each_entry(gmem, gmem_list, entry) {
struct kvm_memory_slot *slot;
struct kvm *kvm = gmem->kvm;
struct page *page;
kvm_pfn_t pfn;
gfn_t gfn;
int rc;
if (!kvm_arch_gmem_prepare_needed(kvm))
continue;
slot = xa_load(&gmem->bindings, index);
if (!slot)
continue;
page = folio_file_page(folio, index);
pfn = page_to_pfn(page);
gfn = slot->base_gfn + index - slot->gmem.pgoff;
rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
if (rc) {
pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
index, gfn, pfn, rc);
return rc;
}
}
#endif
return 0;
}
static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
{ {
struct folio *folio; struct folio *folio;
/* TODO: Support huge pages. */ /* TODO: Support huge pages. */
folio = filemap_grab_folio(inode->i_mapping, index); folio = filemap_grab_folio(inode->i_mapping, index);
if (IS_ERR_OR_NULL(folio)) if (IS_ERR(folio))
return NULL; return folio;
/* /*
* Use the up-to-date flag to track whether or not the memory has been * Use the up-to-date flag to track whether or not the memory has been
...@@ -41,6 +77,15 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) ...@@ -41,6 +77,15 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
folio_mark_uptodate(folio); folio_mark_uptodate(folio);
} }
if (prepare) {
int r = kvm_gmem_prepare_folio(inode, index, folio);
if (r < 0) {
folio_unlock(folio);
folio_put(folio);
return ERR_PTR(r);
}
}
/* /*
* Ignore accessed, referenced, and dirty flags. The memory is * Ignore accessed, referenced, and dirty flags. The memory is
* unevictable and there is no storage to write back to. * unevictable and there is no storage to write back to.
...@@ -145,9 +190,9 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) ...@@ -145,9 +190,9 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
break; break;
} }
folio = kvm_gmem_get_folio(inode, index); folio = kvm_gmem_get_folio(inode, index, true);
if (!folio) { if (IS_ERR(folio)) {
r = -ENOMEM; r = PTR_ERR(folio);
break; break;
} }
...@@ -298,10 +343,24 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol ...@@ -298,10 +343,24 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
return MF_DELAYED; return MF_DELAYED;
} }
#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
static void kvm_gmem_free_folio(struct folio *folio)
{
struct page *page = folio_page(folio, 0);
kvm_pfn_t pfn = page_to_pfn(page);
int order = folio_order(folio);
kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
}
#endif
static const struct address_space_operations kvm_gmem_aops = { static const struct address_space_operations kvm_gmem_aops = {
.dirty_folio = noop_dirty_folio, .dirty_folio = noop_dirty_folio,
.migrate_folio = kvm_gmem_migrate_folio, .migrate_folio = kvm_gmem_migrate_folio,
.error_remove_folio = kvm_gmem_error_folio, .error_remove_folio = kvm_gmem_error_folio,
#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
.free_folio = kvm_gmem_free_folio,
#endif
}; };
static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *path, static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *path,
...@@ -357,6 +416,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) ...@@ -357,6 +416,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
inode->i_private = (void *)(unsigned long)flags; inode->i_private = (void *)(unsigned long)flags;
inode->i_op = &kvm_gmem_iops; inode->i_op = &kvm_gmem_iops;
inode->i_mapping->a_ops = &kvm_gmem_aops; inode->i_mapping->a_ops = &kvm_gmem_aops;
inode->i_mapping->flags |= AS_INACCESSIBLE;
inode->i_mode |= S_IFREG; inode->i_mode |= S_IFREG;
inode->i_size = size; inode->i_size = size;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
...@@ -482,32 +542,29 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) ...@@ -482,32 +542,29 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
fput(file); fput(file);
} }
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *max_order) gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
{ {
pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
struct kvm_gmem *gmem; struct kvm_gmem *gmem = file->private_data;
struct folio *folio; struct folio *folio;
struct page *page; struct page *page;
struct file *file;
int r; int r;
file = kvm_gmem_get_file(slot); if (file != slot->gmem.file) {
if (!file) WARN_ON_ONCE(slot->gmem.file);
return -EFAULT; return -EFAULT;
}
gmem = file->private_data; gmem = file->private_data;
if (xa_load(&gmem->bindings, index) != slot) {
if (WARN_ON_ONCE(xa_load(&gmem->bindings, index) != slot)) { WARN_ON_ONCE(xa_load(&gmem->bindings, index));
r = -EIO; return -EIO;
goto out_fput;
} }
folio = kvm_gmem_get_folio(file_inode(file), index); folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
if (!folio) { if (IS_ERR(folio))
r = -ENOMEM; return PTR_ERR(folio);
goto out_fput;
}
if (folio_test_hwpoison(folio)) { if (folio_test_hwpoison(folio)) {
r = -EHWPOISON; r = -EHWPOISON;
...@@ -524,9 +581,73 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, ...@@ -524,9 +581,73 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
out_unlock: out_unlock:
folio_unlock(folio); folio_unlock(folio);
out_fput:
fput(file);
return r; return r;
} }
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
{
struct file *file = kvm_gmem_get_file(slot);
int r;
if (!file)
return -EFAULT;
r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
fput(file);
return r;
}
EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
kvm_gmem_populate_cb post_populate, void *opaque)
{
struct file *file;
struct kvm_memory_slot *slot;
void __user *p;
int ret = 0, max_order;
long i;
lockdep_assert_held(&kvm->slots_lock);
if (npages < 0)
return -EINVAL;
slot = gfn_to_memslot(kvm, start_gfn);
if (!kvm_slot_can_be_private(slot))
return -EINVAL;
file = kvm_gmem_get_file(slot);
if (!file)
return -EFAULT;
filemap_invalidate_lock(file->f_mapping);
npages = min_t(ulong, slot->npages - (start_gfn - slot->base_gfn), npages);
for (i = 0; i < npages; i += (1 << max_order)) {
gfn_t gfn = start_gfn + i;
kvm_pfn_t pfn;
ret = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
if (ret)
break;
if (!IS_ALIGNED(gfn, (1 << max_order)) ||
(npages - i) < (1 << max_order))
max_order = 0;
p = src ? src + i * PAGE_SIZE : NULL;
ret = post_populate(kvm, gfn, pfn, p, max_order, opaque);
put_page(pfn_to_page(pfn));
if (ret)
break;
}
filemap_invalidate_unlock(file->f_mapping);
fput(file);
return ret && !i ? ret : i;
}
EXPORT_SYMBOL_GPL(kvm_gmem_populate);
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment