• David Matlack's avatar
    KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs · ada51a9d
    David Matlack authored
    Add support for Eager Page Splitting pages that are mapped by nested
    MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB
    pages, and then splitting all 2MiB pages to 4KiB pages.
    
    Note, Eager Page Splitting is limited to nested MMUs as a policy rather
    than due to any technical reason (the sp->role.guest_mode check could
    just be deleted and Eager Page Splitting would work correctly for all
    shadow MMU pages). There is really no reason to support Eager Page
    Splitting for tdp_mmu=N, since such support will eventually be phased
    out, and there is no current use case supporting Eager Page Splitting on
    hosts where TDP is either disabled or unavailable in hardware.
    Furthermore, future improvements to nested MMU scalability may diverge
    the code from the legacy shadow paging implementation. These
    improvements will be simpler to make if Eager Page Splitting does not
    have to worry about legacy shadow paging.
    
    Splitting huge pages mapped by nested MMUs requires dealing with some
    extra complexity beyond that of the TDP MMU:
    
    (1) The shadow MMU has a limit on the number of shadow pages that are
        allowed to be allocated. So, as a policy, Eager Page Splitting
        refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer
        pages available.
    
    (2) Splitting a huge page may end up re-using an existing lower level
        shadow page tables. This is unlike the TDP MMU which always allocates
        new shadow page tables when splitting.
    
    (3) When installing the lower level SPTEs, they must be added to the
        rmap which may require allocating additional pte_list_desc structs.
    
    Case (2) is especially interesting since it may require a TLB flush,
    unlike the TDP MMU which can fully split huge pages without any TLB
    flushes. Specifically, an existing lower level page table may point to
    even lower level page tables that are not fully populated, effectively
    unmapping a portion of the huge page, which requires a flush.  As of
    this commit, a flush is always done always after dropping the huge page
    and before installing the lower level page table.
    
    This TLB flush could instead be delayed until the MMU lock is about to be
    dropped, which would batch flushes for multiple splits.  However these
    flushes should be rare in practice (a huge page must be aliased in
    multiple SPTEs and have been split for NX Huge Pages in only some of
    them). Flushing immediately is simpler to plumb and also reduces the
    chances of tripping over a CPU bug (e.g. see iTLB multihit).
    
    [ This commit is based off of the original implementation of Eager Page
      Splitting from Peter in Google's kernel from 2016. ]
    Suggested-by: default avatarPeter Feiner <pfeiner@google.com>
    Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
    Message-Id: <20220516232138.1783324-23-dmatlack@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    ada51a9d
kernel-parameters.txt 239 KB