• Sean Christopherson's avatar
    KVM: x86/mmu: Map TDP MMU leaf SPTE iff target level is reached · 80a3e4ae
    Sean Christopherson authored
    Map the leaf SPTE when handling a TDP MMU page fault if and only if the
    target level is reached.  A recent commit reworked the retry logic and
    incorrectly assumed that walking SPTEs would never "fail", as the loop
    either bails (retries) or installs parent SPs.  However, the iterator
    itself will bail early if it detects a frozen (REMOVED) SPTE when
    stepping down.   The TDP iterator also rereads the current SPTE before
    stepping down specifically to avoid walking into a part of the tree that
    is being removed, which means it's possible to terminate the loop without
    the guts of the loop observing the frozen SPTE, e.g. if a different task
    zaps a parent SPTE between the initial read and try_step_down()'s refresh.
    
    Mapping a leaf SPTE at the wrong level results in all kinds of badness as
    page table walkers interpret the SPTE as a page table, not a leaf, and
    walk into the weeds.
    
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 1025 at arch/x86/kvm/mmu/tdp_mmu.c:1070 kvm_tdp_mmu_map+0x481/0x510
      Modules linked in: kvm_intel
      CPU: 1 PID: 1025 Comm: nx_huge_pages_t Tainted: G        W          6.1.0-rc4+ #64
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_tdp_mmu_map+0x481/0x510
      RSP: 0018:ffffc9000072fba8 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffffc9000072fcc0 RCX: 0000000000000027
      RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff888277c5b4c8
      RBP: ffff888107d45a10 R08: ffff888277c5b4c0 R09: ffffc9000072fa48
      R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000073a0e0
      R13: ffff88810fc54800 R14: ffff888107d1ae60 R15: ffff88810fc54f90
      FS:  00007fba9f853740(0000) GS:ffff888277c40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010aa7a003 CR4: 0000000000172ea0
      Call Trace:
       <TASK>
       kvm_tdp_page_fault+0x10c/0x130
       kvm_mmu_page_fault+0x103/0x680
       vmx_handle_exit+0x132/0x5a0 [kvm_intel]
       vcpu_enter_guest+0x60c/0x16f0
       kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0
       kvm_vcpu_ioctl+0x271/0x660
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x2b/0x50
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      ---[ end trace 0000000000000000 ]---
      Invalid SPTE change: cannot replace a present leaf
      SPTE with another present leaf SPTE mapping a
      different PFN!
      as_id: 0 gfn: 100200 old_spte: 600000112400bf3 new_spte: 6000001126009f3 level: 2
      ------------[ cut here ]------------
      kernel BUG at arch/x86/kvm/mmu/tdp_mmu.c:559!
      invalid opcode: 0000 [#1] SMP
      CPU: 1 PID: 1025 Comm: nx_huge_pages_t Tainted: G        W          6.1.0-rc4+ #64
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:__handle_changed_spte.cold+0x95/0x9c
      RSP: 0018:ffffc9000072faf8 EFLAGS: 00010246
      RAX: 00000000000000c1 RBX: ffffc90000731000 RCX: 0000000000000027
      RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: ffff888277c5b4c8
      RBP: 0600000112400bf3 R08: ffff888277c5b4c0 R09: ffffc9000072f9a0
      R10: 0000000000000001 R11: 0000000000000001 R12: 06000001126009f3
      R13: 0000000000000002 R14: 0000000012600901 R15: 0000000012400b01
      FS:  00007fba9f853740(0000) GS:ffff888277c40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010aa7a003 CR4: 0000000000172ea0
      Call Trace:
       <TASK>
       kvm_tdp_mmu_map+0x3b0/0x510
       kvm_tdp_page_fault+0x10c/0x130
       kvm_mmu_page_fault+0x103/0x680
       vmx_handle_exit+0x132/0x5a0 [kvm_intel]
       vcpu_enter_guest+0x60c/0x16f0
       kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0
       kvm_vcpu_ioctl+0x271/0x660
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x2b/0x50
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      Modules linked in: kvm_intel
      ---[ end trace 0000000000000000 ]---
    
    Fixes: 63d28a25 ("KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry")
    Cc: Robert Hoo <robert.hu@linux.intel.com>
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20221213033030.83345-3-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    80a3e4ae
tdp_mmu.c 57.5 KB