1. 10 Sep, 2019 6 commits
  2. 09 Sep, 2019 1 commit
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier authored
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Reviewed-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      92f35b75
  3. 27 Aug, 2019 4 commits
    • James Morse's avatar
      arm64: KVM: Device mappings should be execute-never · e8688ba3
      James Morse authored
      Since commit 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in
      Stage-2 if CTR_EL0.DIC is set"), KVM has stopped marking normal memory
      as execute-never at stage2 when the system supports D->I Coherency at
      the PoU. This avoids KVM taking a trap when the page is first executed,
      in order to clean it to PoU.
      
      The patch that added this change also wrapped PAGE_S2_DEVICE mappings
      up in this too. The upshot is, if your CPU caches support DIC ...
      you can execute devices.
      
      Revert the PAGE_S2_DEVICE change so PTE_S2_XN is always used
      directly.
      
      Fixes: 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in Stage-2 if CTR_EL0.DIC is set")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      e8688ba3
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9 · ff42df49
      Paul Mackerras authored
      On POWER9, when userspace reads the value of the DPDES register on a
      vCPU, it is possible for 0 to be returned although there is a doorbell
      interrupt pending for the vCPU.  This can lead to a doorbell interrupt
      being lost across migration.  If the guest kernel uses doorbell
      interrupts for IPIs, then it could malfunction because of the lost
      interrupt.
      
      This happens because a newly-generated doorbell interrupt is signalled
      by setting vcpu->arch.doorbell_request to 1; the DPDES value in
      vcpu->arch.vcore->dpdes is not updated, because it can only be updated
      when holding the vcpu mutex, in order to avoid races.
      
      To fix this, we OR in vcpu->arch.doorbell_request when reading the
      DPDES value.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      ff42df49
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores · d28eafc5
      Paul Mackerras authored
      When we are running multiple vcores on the same physical core, they
      could be from different VMs and so it is possible that one of the
      VMs could have its arch.mmu_ready flag cleared (for example by a
      concurrent HPT resize) when we go to run it on a physical core.
      We currently check the arch.mmu_ready flag for the primary vcore
      but not the flags for the other vcores that will be run alongside
      it.  This adds that check, and also a check when we select the
      secondary vcores from the preempted vcores list.
      
      Cc: stable@vger.kernel.org # v4.14+
      Fixes: 38c53af8 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      d28eafc5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required functions · 2ad7a27d
      Paul Mackerras authored
      There are some POWER9 machines where the OPAL firmware does not support
      the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls.
      The impact of this is that a guest using XIVE natively will not be able
      to be migrated successfully.  On the source side, the get_attr operation
      on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute
      will fail; on the destination side, the set_attr operation for the same
      attribute will fail.
      
      This adds tests for the existence of the OPAL get/set queue state
      functions, and if they are not supported, the XIVE-native KVM device
      is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false.
      Userspace can then either provide a software emulation of XIVE, or
      else tell the guest that it does not have a XIVE controller available
      to it.
      
      Cc: stable@vger.kernel.org # v5.2+
      Fixes: 3fab2d10 ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode")
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2ad7a27d
  4. 25 Aug, 2019 2 commits
  5. 23 Aug, 2019 3 commits
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslot · d22deab6
      Suraj Jitindar Singh authored
      The rmap array in the guest memslot is an array of size number of guest
      pages, allocated at memslot creation time. Each rmap entry in this array
      is used to store information about the guest page to which it
      corresponds. For example for a hpt guest it is used to store a lock bit,
      rc bits, a present bit and the index of a hpt entry in the guest hpt
      which maps this page. For a radix guest which is running nested guests
      it is used to store a pointer to a linked list of nested rmap entries
      which store the nested guest physical address which maps this guest
      address and for which there is a pte in the shadow page table.
      
      As there are currently two uses for the rmap array, and the potential
      for this to expand to more in the future, define a type field (being the
      top 8 bits of the rmap entry) to be used to define the type of the rmap
      entry which is currently present and define two values for this field
      for the two current uses of the rmap array.
      
      Since the nested case uses the rmap entry to store a pointer, define
      this type as having the two high bits set as is expected for a pointer.
      Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering).
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      d22deab6
    • Paul Menzel's avatar
      KVM: PPC: Book3S: Mark expected switch fall-through · ff7240cc
      Paul Menzel authored
      Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging
      it as an expected fall-through.
      
          arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’:
          arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=]
                pte->may_write = true;
                ~~~~~~~~~~~~~~~^~~~~~
          arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here
               case 3:
               ^~~~
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ff7240cc
    • Paul Mackerras's avatar
      Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · 75bf465f
      Paul Mackerras authored
      This merges in fixes for the XIVE interrupt controller which touch both
      generic powerpc and PPC KVM code.  To avoid merge conflicts, these
      commits will go upstream via the powerpc tree as well as the KVM tree.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      75bf465f
  6. 22 Aug, 2019 18 commits
  7. 21 Aug, 2019 3 commits
  8. 18 Aug, 2019 3 commits
    • Marc Zyngier's avatar
      KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 07ab0f8d
      Marc Zyngier authored
      When a vpcu is about to block by calling kvm_vcpu_block, we call
      back into the arch code to allow any form of synchronization that
      may be required at this point (SVN stops the AVIC, ARM synchronises
      the VMCR and enables GICv4 doorbells). But this synchronization
      comes in quite late, as we've potentially waited for halt_poll_ns
      to expire.
      
      Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
      kvm_vcpu_block(), which on ARM has several benefits:
      
      - VMCR gets synchronised early, meaning that any interrupt delivered
        during the polling window will be evaluated with the correct guest
        PMR
      - GICv4 doorbells are enabled, which means that any guest interrupt
        directly injected during that window will be immediately recognised
      
      Tang Nianyao ran some tests on a GICv4 machine to evaluate such
      change, and reported up to a 10% improvement for netperf:
      
      <quote>
      	netperf result:
      	D06 as server, intel 8180 server as client
      	with change:
      	package 512 bytes - 5500 Mbits/s
      	package 64 bytes - 760 Mbits/s
      	without change:
      	package 512 bytes - 5000 Mbits/s
      	package 64 bytes - 710 Mbits/s
      </quote>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      07ab0f8d
    • Alexandru Elisei's avatar
      KVM: arm/arm64: vgic: Make function comments match function declarations · 0ed5f5d6
      Alexandru Elisei authored
      Since commit 503a6286 ("KVM: arm/arm64: vgic: Rely on the GIC driver to
      parse the firmware tables"), the vgic_v{2,3}_probe functions stopped using
      a DT node. Commit 90977732 ("KVM: arm/arm64: vgic-new: vgic_init:
      implement kvm_vgic_hyp_init") changed the functions again, and now they
      require exactly one argument, a struct gic_kvm_info populated by the GIC
      driver. Unfortunately the comments regressed and state that a DT node is
      used instead. Change the function comments to reflect the current
      prototypes.
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      0ed5f5d6
    • Mark Rutland's avatar
      arm64/kvm: Remove VMID rollover I-cache maintenance · 363de99b
      Mark Rutland authored
      For VPIPT I-caches, we need I-cache maintenance on VMID rollover to
      avoid an ABA problem. Consider a single vCPU VM, with a pinned stage-2,
      running with an idmap VA->IPA and idmap IPA->PA. If we don't do
      maintenance on rollover:
      
              // VMID A
              Writes insn X to PA 0xF
              Invalidates PA 0xF (for VMID A)
      
              I$ contains [{A,F}->X]
      
              [VMID ROLLOVER]
      
              // VMID B
              Writes insn Y to PA 0xF
              Invalidates PA 0xF (for VMID B)
      
              I$ contains [{A,F}->X, {B,F}->Y]
      
              [VMID ROLLOVER]
      
              // VMID A
              I$ contains [{A,F}->X, {B,F}->Y]
      
              Unexpectedly hits stale I$ line {A,F}->X.
      
      However, for PIPT and VIPT I-caches, the VMID doesn't affect lookup or
      constrain maintenance. Given the VMID doesn't affect PIPT and VIPT
      I-caches, and given VMID rollover is independent of changes to stage-2
      mappings, I-cache maintenance cannot be necessary on VMID rollover for
      PIPT or VIPT I-caches.
      
      This patch removes the maintenance on rollover for VIPT and PIPT
      I-caches. At the same time, the unnecessary colons are removed from the
      asm statement to make it more legible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      363de99b