1. 01 Mar, 2022 13 commits
    • Sean Christopherson's avatar
      KVM: selftests: Add test to verify KVM handling of ICR · 85c68eb4
      Sean Christopherson authored
      The main thing that the selftest verifies is that KVM copies x2APIC's
      ICR[63:32] to/from ICR2 when userspace accesses the vAPIC page via
      KVM_{G,S}ET_LAPIC.  KVM previously split x2APIC ICR to ICR+ICR2 at the
      time of write (from the guest), and so KVM must preserve that behavior
      for backwards compatibility between different versions of KVM.
      
      It will also test other invariants, e.g. that KVM clears the BUSY
      flag on ICR writes, that the reserved bits in ICR2 are dropped on writes
      from the guest, etc...
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-12-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      85c68eb4
    • Sean Christopherson's avatar
      KVM: x86: Make kvm_lapic_set_reg() a "private" xAPIC helper · b9964ee3
      Sean Christopherson authored
      Hide the lapic's "raw" write helper inside lapic.c to force non-APIC code
      to go through proper helpers when modification the vAPIC state.  Keep the
      read helper visible to outsiders for now, refactoring KVM to hide it too
      is possible, it will just take more work to do so.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-11-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b9964ee3
    • Sean Christopherson's avatar
      KVM: x86: Treat x2APIC's ICR as a 64-bit register, not two 32-bit regs · a57a3168
      Sean Christopherson authored
      Emulate the x2APIC ICR as a single 64-bit register, as opposed to forking
      it across ICR and ICR2 as two 32-bit registers.  This mirrors hardware
      behavior for Intel's upcoming IPI virtualization support, which does not
      split the access.
      
      Previous versions of Intel's SDM and AMD's APM don't explicitly state
      exactly how ICR is reflected in the vAPIC page for x2APIC, KVM just
      happened to speculate incorrectly.
      
      Handling the upcoming behavior is necessary in order to maintain
      backwards compatibility with KVM_{G,S}ET_LAPIC, e.g. failure to shuffle
      the 64-bit ICR to ICR+ICR2 and vice versa would break live migration if
      IPI virtualization support isn't symmetrical across the source and dest.
      
      Cc: Zeng Guang <guang.zeng@intel.com>
      Cc: Chao Gao <chao.gao@intel.com>
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-10-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a57a3168
    • Sean Christopherson's avatar
      KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes · 5429478d
      Sean Christopherson authored
      Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the
      x2APIC and Hyper-V code needed to service reads/writes to ICR.  Future
      support for IPI virtualization will add yet another path where KVM must
      handle 64-bit APIC MSR reads/write (to ICR).
      
      Opportunistically fix the comment in the write path; ICR2 holds the
      destination (if there's no shorthand), not the vector.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-9-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5429478d
    • Sean Christopherson's avatar
      KVM: x86: Make kvm_lapic_reg_{read,write}() static · 70180052
      Sean Christopherson authored
      Make the low level read/write lapic helpers static, any accesses to the
      local APIC from vendor code or non-APIC code should be routed through
      proper helpers.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      70180052
    • Sean Christopherson's avatar
      KVM: x86: WARN if KVM emulates an IPI without clearing the BUSY flag · bd17f417
      Sean Christopherson authored
      WARN if KVM emulates an IPI without clearing the BUSY flag, failure to do
      so could hang the guest if it waits for the IPI be sent.
      
      Opportunistically use APIC_ICR_BUSY macro instead of open coding the
      magic number, and add a comment to clarify why kvm_recalculate_apic_map()
      is unconditionally invoked (it's really, really confusing for IPIs due to
      the existence of fast paths that don't trigger a potential recalc).
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bd17f417
    • Sean Christopherson's avatar
      KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure · b51818af
      Sean Christopherson authored
      Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI
      virtualization failure, the access is a trap, i.e. the value has already
      been written to the vAPIC page.  The one caveat is if hardware left the
      BUSY flag set (which appears to happen somewhat arbitrarily), in which
      case go through the "nodecode" APIC-write path in order to clear the BUSY
      flag.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b51818af
    • Sean Christopherson's avatar
      KVM: SVM: Use common kvm_apic_write_nodecode() for AVIC write traps · ed60920e
      Sean Christopherson authored
      Use the common kvm_apic_write_nodecode() to handle AVIC/APIC-write traps
      instead of open coding the same exact code.  This will allow making the
      low level lapic helpers inaccessible outside of lapic.c code.
      
      Opportunistically clean up the params to eliminate a bunch of svm=>vcpu
      reflection.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ed60920e
    • Sean Christopherson's avatar
      KVM: x86: Use "raw" APIC register read for handling APIC-write VM-Exit · b031f104
      Sean Christopherson authored
      Use the "raw" helper to read the vAPIC register after an APIC-write trap
      VM-Exit.  Hardware is responsible for vetting the write, and the caller
      is responsible for sanitizing the offset.  This is a functional change,
      as it means KVM will consume whatever happens to be in the vAPIC page if
      the write was dropped by hardware.  But, unless userspace deliberately
      wrote garbage into the vAPIC page via KVM_SET_LAPIC, the value should be
      zero since it's not writable by the guest.
      
      This aligns common x86 with SVM's AVIC logic, i.e. paves the way for
      using the nodecode path to handle APIC-write traps when AVIC is enabled.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b031f104
    • Sean Christopherson's avatar
      KVM: VMX: Handle APIC-write offset wrangling in VMX code · b5ede3df
      Sean Christopherson authored
      Move the vAPIC offset adjustments done in the APIC-write trap path from
      common x86 to VMX in anticipation of using the nodecode path for SVM's
      AVIC.  The adjustment reflects hardware behavior, i.e. it's technically a
      property of VMX, no common x86.  SVM's AVIC behavior is identical, so
      it's a bit of a moot point, the goal is purely to make it easier to
      understand why the adjustment is ok.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5ede3df
    • Paolo Bonzini's avatar
      KVM: x86: Do not change ICR on write to APIC_SELF_IPI · d22a81b3
      Paolo Bonzini authored
      Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect:
      the value of ICR in vAPIC page gets changed.  The lists SELF_IPI as write-only,
      with no associated MMIO offset, so any write should have no visible side
      effect in the vAPIC page.
      Reported-by: default avatarChao Gao <chao.gao@intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d22a81b3
    • Zhenzhong Duan's avatar
      KVM: x86: Fix emulation in writing cr8 · f66af9f2
      Zhenzhong Duan authored
      In emulation of writing to cr8, one of the lowest four bits in TPR[3:0]
      is kept.
      
      According to Intel SDM 10.8.6.1(baremetal scenario):
      "APIC.TPR[bits 7:4] = CR8[bits 3:0], APIC.TPR[bits 3:0] = 0";
      
      and SDM 28.3(use TPR shadow):
      "MOV to CR8. The instruction stores bits 3:0 of its source operand into
      bits 7:4 of VTPR; the remainder of VTPR (bits 3:0 and bits 31:8) are
      cleared.";
      
      and AMD's APM 16.6.4:
      "Task Priority Sub-class (TPS)-Bits 3 : 0. The TPS field indicates the
      current sub-priority to be used when arbitrating lowest-priority messages.
      This field is written with zero when TPR is written using the architectural
      CR8 register.";
      
      so in KVM emulated scenario, clear TPR[3:0] to make a consistent behavior
      as in other scenarios.
      
      This doesn't impact evaluation and delivery of pending virtual interrupts
      because processor does not use the processor-priority sub-class to
      determine which interrupts to delivery and which to inhibit.
      
      Sub-class is used by hardware to arbitrate lowest priority interrupts,
      but KVM just does a round-robin style delivery.
      
      Fixes: b93463aa ("KVM: Accelerated apic support")
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220210094506.20181-1-zhenzhong.duan@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f66af9f2
    • Paolo Bonzini's avatar
      KVM: x86: flush TLB separately from MMU reset · b5f61c03
      Paolo Bonzini authored
      For both CR0 and CR4, disassociate the TLB flush logic from the
      MMU role logic.  Instead  of relying on kvm_mmu_reset_context() being
      a superset of various TLB flushes (which is not necessarily going to
      be the case in the future), always call it if the role changes
      but also set the various TLB flush requests according to what is
      in the manual.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5f61c03
  2. 25 Feb, 2022 23 commits
  3. 24 Feb, 2022 1 commit
  4. 22 Feb, 2022 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-5.18-1' of... · 08288241
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Changes for 5.18 part1
      
      - add Claudio as Maintainer
      - first step to do proper storage key checking
      - testcase for missing memop check
      08288241
    • Nicholas Piggin's avatar
      KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 · 93b71801
      Nicholas Piggin authored
      Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
      resource mode to 3 with the H_SET_MODE hypercall. This capability
      differs between processor types and KVM types (PR, HV, Nested HV), and
      affects guest-visible behaviour.
      
      QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
      use the KVM CAP if available to determine KVM support[2].
      Reviewed-by: default avatarFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      93b71801
    • Janis Schoetterl-Glausch's avatar
      KVM: s390: Add missing vm MEM_OP size check · 3d9042f8
      Janis Schoetterl-Glausch authored
      Check that size is not zero, preventing the following warning:
      
      WARNING: CPU: 0 PID: 9692 at mm/vmalloc.c:3059 __vmalloc_node_range+0x528/0x648
      Modules linked in:
      CPU: 0 PID: 9692 Comm: memop Not tainted 5.17.0-rc3-e4+ #80
      Hardware name: IBM 8561 T01 701 (LPAR)
      Krnl PSW : 0704c00180000000 0000000082dc584c (__vmalloc_node_range+0x52c/0x648)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000083 ffffffffffffffff 0000000000000000 0000000000000001
                 0000038000000000 000003ff80000000 0000000000000cc0 000000008ebb8000
                 0000000087a8a700 000000004040aeb1 000003ffd9f7dec8 000000008ebb8000
                 000000009d9b8000 000000000102a1b4 00000380035afb68 00000380035afaa8
      Krnl Code: 0000000082dc583e: d028a7f4ff80        trtr    2036(41,%r10),3968(%r15)
                 0000000082dc5844: af000000            mc      0,0
                #0000000082dc5848: af000000            mc      0,0
                >0000000082dc584c: a7d90000            lghi    %r13,0
                 0000000082dc5850: b904002d            lgr     %r2,%r13
                 0000000082dc5854: eb6ff1080004        lmg     %r6,%r15,264(%r15)
                 0000000082dc585a: 07fe                bcr     15,%r14
                 0000000082dc585c: 47000700            bc      0,1792
      Call Trace:
       [<0000000082dc584c>] __vmalloc_node_range+0x52c/0x648
       [<0000000082dc5b62>] vmalloc+0x5a/0x68
       [<000003ff8067f4ca>] kvm_arch_vm_ioctl+0x2da/0x2a30 [kvm]
       [<000003ff806705bc>] kvm_vm_ioctl+0x4ec/0x978 [kvm]
       [<0000000082e562fe>] __s390x_sys_ioctl+0xbe/0x100
       [<000000008360a9bc>] __do_syscall+0x1d4/0x200
       [<0000000083618bd2>] system_call+0x82/0xb0
      Last Breaking-Event-Address:
       [<0000000082dc5348>] __vmalloc_node_range+0x28/0x648
      
      Other than the warning, there is no ill effect from the missing check,
      the condition is detected by subsequent code and causes a return
      with ENOMEM.
      
      Fixes: ef11c946 (KVM: s390: Add vm IOCTL for key checked guest absolute memory access)
      Signed-off-by: default avatarJanis Schoetterl-Glausch <scgl@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220221163237.4122868-1-scgl@linux.ibm.comSigned-off-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      3d9042f8