1. 10 Sep, 2019 4 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 32d1d15c
      Paolo Bonzini authored
      KVM/arm updates for 5.4
      
      - New ITS translation cache
      - Allow up to 512 CPUs to be supported with GICv3 (for real this time)
      - Now call kvm_arch_vcpu_blocking early in the blocking sequence
      - Tidy-up device mappings in S2 when DIC is available
      - Clean icache invalidation on VMID rollover
      - General cleanup
      32d1d15c
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-next-5.4-1' of... · 8146856b
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      PPC KVM update for 5.4
      
      - Some prep for extending the uses of the rmap array
      - Various minor fixes
      - Commits from the powerpc topic/ppc-kvm branch, which fix a problem
        with interrupts arriving after free_irq, causing host hangs and crashes.
      8146856b
    • Sean Christopherson's avatar
      KVM: x86: Manually calculate reserved bits when loading PDPTRS · 16cfacc8
      Sean Christopherson authored
      Manually generate the PDPTR reserved bit mask when explicitly loading
      PDPTRs.  The reserved bits that are being tracked by the MMU reflect the
      current paging mode, which is unlikely to be PAE paging in the vast
      majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
      __set_sregs(), etc...  This can cause KVM to incorrectly signal a bad
      PDPTR, or more likely, miss a reserved bit check and subsequently fail
      a VM-Enter due to a bad VMCS.GUEST_PDPTR.
      
      Add a one off helper to generate the reserved bits instead of sharing
      code across the MMU's calculations and the PDPTR emulation.  The PDPTR
      reserved bits are basically set in stone, and pushing a helper into
      the MMU's calculation adds unnecessary complexity without improving
      readability.
      
      Oppurtunistically fix/update the comment for load_pdptrs().
      
      Note, the buggy commit also introduced a deliberate functional change,
      "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
      effectively (and correctly) reverted by commit cd9ae5fe ("KVM: x86:
      Fix page-tables reserved bits").  A bit of SDM archaeology shows that
      the SDM from late 2008 had a bug (likely a copy+paste error) where it
      listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
      for 2mb entries.  I.e. the SDM contradicted itself, and bits 6:5 are and
      always have been reserved.
      
      Fixes: 20c466b5 ("KVM: Use rsvd_bits_mask in load_pdptrs()")
      Cc: stable@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Reported-by: default avatarDoug Reiland <doug.reiland@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      16cfacc8
    • Alexander Graf's avatar
      KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes · fdcf7562
      Alexander Graf authored
      We can easily route hardware interrupts directly into VM context when
      they target the "Fixed" or "LowPriority" delivery modes.
      
      However, on modes such as "SMI" or "Init", we need to go via KVM code
      to actually put the vCPU into a different mode of operation, so we can
      not post the interrupt
      
      Add code in the VMX and SVM PI logic to explicitly refuse to establish
      posted mappings for advanced IRQ deliver modes. This reflects the logic
      in __apic_accept_irq() which also only ever passes Fixed and LowPriority
      interrupts as posted interrupts into the guest.
      
      This fixes a bug I have with code which configures real hardware to
      inject virtual SMIs into my guest.
      Signed-off-by: default avatarAlexander Graf <graf@amazon.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdcf7562
  2. 09 Sep, 2019 1 commit
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier authored
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Reviewed-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      92f35b75
  3. 27 Aug, 2019 4 commits
    • James Morse's avatar
      arm64: KVM: Device mappings should be execute-never · e8688ba3
      James Morse authored
      Since commit 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in
      Stage-2 if CTR_EL0.DIC is set"), KVM has stopped marking normal memory
      as execute-never at stage2 when the system supports D->I Coherency at
      the PoU. This avoids KVM taking a trap when the page is first executed,
      in order to clean it to PoU.
      
      The patch that added this change also wrapped PAGE_S2_DEVICE mappings
      up in this too. The upshot is, if your CPU caches support DIC ...
      you can execute devices.
      
      Revert the PAGE_S2_DEVICE change so PTE_S2_XN is always used
      directly.
      
      Fixes: 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in Stage-2 if CTR_EL0.DIC is set")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      e8688ba3
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9 · ff42df49
      Paul Mackerras authored
      On POWER9, when userspace reads the value of the DPDES register on a
      vCPU, it is possible for 0 to be returned although there is a doorbell
      interrupt pending for the vCPU.  This can lead to a doorbell interrupt
      being lost across migration.  If the guest kernel uses doorbell
      interrupts for IPIs, then it could malfunction because of the lost
      interrupt.
      
      This happens because a newly-generated doorbell interrupt is signalled
      by setting vcpu->arch.doorbell_request to 1; the DPDES value in
      vcpu->arch.vcore->dpdes is not updated, because it can only be updated
      when holding the vcpu mutex, in order to avoid races.
      
      To fix this, we OR in vcpu->arch.doorbell_request when reading the
      DPDES value.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      ff42df49
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores · d28eafc5
      Paul Mackerras authored
      When we are running multiple vcores on the same physical core, they
      could be from different VMs and so it is possible that one of the
      VMs could have its arch.mmu_ready flag cleared (for example by a
      concurrent HPT resize) when we go to run it on a physical core.
      We currently check the arch.mmu_ready flag for the primary vcore
      but not the flags for the other vcores that will be run alongside
      it.  This adds that check, and also a check when we select the
      secondary vcores from the preempted vcores list.
      
      Cc: stable@vger.kernel.org # v4.14+
      Fixes: 38c53af8 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      d28eafc5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required functions · 2ad7a27d
      Paul Mackerras authored
      There are some POWER9 machines where the OPAL firmware does not support
      the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls.
      The impact of this is that a guest using XIVE natively will not be able
      to be migrated successfully.  On the source side, the get_attr operation
      on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute
      will fail; on the destination side, the set_attr operation for the same
      attribute will fail.
      
      This adds tests for the existence of the OPAL get/set queue state
      functions, and if they are not supported, the XIVE-native KVM device
      is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false.
      Userspace can then either provide a software emulation of XIVE, or
      else tell the guest that it does not have a XIVE controller available
      to it.
      
      Cc: stable@vger.kernel.org # v5.2+
      Fixes: 3fab2d10 ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode")
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2ad7a27d
  4. 25 Aug, 2019 2 commits
  5. 23 Aug, 2019 3 commits
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslot · d22deab6
      Suraj Jitindar Singh authored
      The rmap array in the guest memslot is an array of size number of guest
      pages, allocated at memslot creation time. Each rmap entry in this array
      is used to store information about the guest page to which it
      corresponds. For example for a hpt guest it is used to store a lock bit,
      rc bits, a present bit and the index of a hpt entry in the guest hpt
      which maps this page. For a radix guest which is running nested guests
      it is used to store a pointer to a linked list of nested rmap entries
      which store the nested guest physical address which maps this guest
      address and for which there is a pte in the shadow page table.
      
      As there are currently two uses for the rmap array, and the potential
      for this to expand to more in the future, define a type field (being the
      top 8 bits of the rmap entry) to be used to define the type of the rmap
      entry which is currently present and define two values for this field
      for the two current uses of the rmap array.
      
      Since the nested case uses the rmap entry to store a pointer, define
      this type as having the two high bits set as is expected for a pointer.
      Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering).
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      d22deab6
    • Paul Menzel's avatar
      KVM: PPC: Book3S: Mark expected switch fall-through · ff7240cc
      Paul Menzel authored
      Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging
      it as an expected fall-through.
      
          arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’:
          arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=]
                pte->may_write = true;
                ~~~~~~~~~~~~~~~^~~~~~
          arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here
               case 3:
               ^~~~
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ff7240cc
    • Paul Mackerras's avatar
      Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · 75bf465f
      Paul Mackerras authored
      This merges in fixes for the XIVE interrupt controller which touch both
      generic powerpc and PPC KVM code.  To avoid merge conflicts, these
      commits will go upstream via the powerpc tree as well as the KVM tree.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      75bf465f
  6. 22 Aug, 2019 18 commits
  7. 21 Aug, 2019 3 commits
  8. 18 Aug, 2019 5 commits
    • Marc Zyngier's avatar
      KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 07ab0f8d
      Marc Zyngier authored
      When a vpcu is about to block by calling kvm_vcpu_block, we call
      back into the arch code to allow any form of synchronization that
      may be required at this point (SVN stops the AVIC, ARM synchronises
      the VMCR and enables GICv4 doorbells). But this synchronization
      comes in quite late, as we've potentially waited for halt_poll_ns
      to expire.
      
      Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
      kvm_vcpu_block(), which on ARM has several benefits:
      
      - VMCR gets synchronised early, meaning that any interrupt delivered
        during the polling window will be evaluated with the correct guest
        PMR
      - GICv4 doorbells are enabled, which means that any guest interrupt
        directly injected during that window will be immediately recognised
      
      Tang Nianyao ran some tests on a GICv4 machine to evaluate such
      change, and reported up to a 10% improvement for netperf:
      
      <quote>
      	netperf result:
      	D06 as server, intel 8180 server as client
      	with change:
      	package 512 bytes - 5500 Mbits/s
      	package 64 bytes - 760 Mbits/s
      	without change:
      	package 512 bytes - 5000 Mbits/s
      	package 64 bytes - 710 Mbits/s
      </quote>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      07ab0f8d
    • Alexandru Elisei's avatar
      KVM: arm/arm64: vgic: Make function comments match function declarations · 0ed5f5d6
      Alexandru Elisei authored
      Since commit 503a6286 ("KVM: arm/arm64: vgic: Rely on the GIC driver to
      parse the firmware tables"), the vgic_v{2,3}_probe functions stopped using
      a DT node. Commit 90977732 ("KVM: arm/arm64: vgic-new: vgic_init:
      implement kvm_vgic_hyp_init") changed the functions again, and now they
      require exactly one argument, a struct gic_kvm_info populated by the GIC
      driver. Unfortunately the comments regressed and state that a DT node is
      used instead. Change the function comments to reflect the current
      prototypes.
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      0ed5f5d6
    • Mark Rutland's avatar
      arm64/kvm: Remove VMID rollover I-cache maintenance · 363de99b
      Mark Rutland authored
      For VPIPT I-caches, we need I-cache maintenance on VMID rollover to
      avoid an ABA problem. Consider a single vCPU VM, with a pinned stage-2,
      running with an idmap VA->IPA and idmap IPA->PA. If we don't do
      maintenance on rollover:
      
              // VMID A
              Writes insn X to PA 0xF
              Invalidates PA 0xF (for VMID A)
      
              I$ contains [{A,F}->X]
      
              [VMID ROLLOVER]
      
              // VMID B
              Writes insn Y to PA 0xF
              Invalidates PA 0xF (for VMID B)
      
              I$ contains [{A,F}->X, {B,F}->Y]
      
              [VMID ROLLOVER]
      
              // VMID A
              I$ contains [{A,F}->X, {B,F}->Y]
      
              Unexpectedly hits stale I$ line {A,F}->X.
      
      However, for PIPT and VIPT I-caches, the VMID doesn't affect lookup or
      constrain maintenance. Given the VMID doesn't affect PIPT and VIPT
      I-caches, and given VMID rollover is independent of changes to stage-2
      mappings, I-cache maintenance cannot be necessary on VMID rollover for
      PIPT or VIPT I-caches.
      
      This patch removes the maintenance on rollover for VIPT and PIPT
      I-caches. At the same time, the unnecessary colons are removed from the
      asm statement to make it more legible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      363de99b
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomic · 41108170
      Marc Zyngier authored
      Now that we have a cache of MSI->LPI translations, it is pretty
      easy to implement kvm_arch_set_irq_inatomic (this cache can be
      parsed without sleeping).
      
      Hopefully, this will improve some LPI-heavy workloads.
      Tested-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      41108170
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI injection · 86a7dae8
      Marc Zyngier authored
      When performing an MSI injection, let's first check if the translation
      is already in the cache. If so, let's inject it quickly without
      going through the whole translation process.
      Tested-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      86a7dae8