1. 08 Dec, 2016 4 commits
    • Kyle Huey's avatar
      KVM: x86: Add kvm_skip_emulated_instruction and use it. · 6affcbed
      Kyle Huey authored
      kvm_skip_emulated_instruction calls both
      kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
      skipping the emulated instruction and generating a trap if necessary.
      
      Replacing skip_emulated_instruction calls with
      kvm_skip_emulated_instruction is straightforward, except for:
      
      - ICEBP, which is already inside a trap, so avoid triggering another trap.
      - Instructions that can trigger exits to userspace, such as the IO insns,
        MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
        KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
        IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
        take precedence. The singlestep will be triggered again on the next
        instruction, which is the current behavior.
      - Task switch instructions which would require additional handling (e.g.
        the task switch bit) and are instead left alone.
      - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
        which do not trigger singlestep traps as mentioned previously.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      6affcbed
    • Kyle Huey's avatar
      KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 · eb277562
      Kyle Huey authored
      We can't return both the pass/fail boolean for the vmcs and the upcoming
      continue/exit-to-userspace boolean for skip_emulated_instruction out of
      nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead.
      
      Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when
      they advance the IP to the following instruction, not when they a) succeed,
      b) fail MSR validation or c) throw an exception. Add a separate call to
      skip_emulated_instruction that will later not be converted to the variant
      that checks the singlestep flag.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      eb277562
    • Kyle Huey's avatar
      KVM: VMX: Reorder some skip_emulated_instruction calls · 09ca3f20
      Kyle Huey authored
      The functions being moved ahead of skip_emulated_instruction here don't
      need updated IPs, and skipping the emulated instruction at the end will
      make it easier to return its value.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      09ca3f20
    • Kyle Huey's avatar
      KVM: x86: Add a return value to kvm_emulate_cpuid · 6a908b62
      Kyle Huey authored
      Once skipping the emulated instruction can potentially trigger an exit to
      userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
      propagate a return value.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      6a908b62
  2. 06 Dec, 2016 1 commit
  3. 01 Dec, 2016 1 commit
  4. 29 Nov, 2016 2 commits
  5. 28 Nov, 2016 6 commits
  6. 24 Nov, 2016 5 commits
  7. 23 Nov, 2016 13 commits
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00 · 2ee13be3
      Suraj Jitindar Singh authored
      The function kvmppc_set_arch_compat() is used to determine the value of the
      processor compatibility register (PCR) for a guest running in a given
      compatibility mode. There is currently no support for v3.00 of the ISA.
      
      Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode
      to the PCR.
      
      We also add a check to ensure the processor we are running on is capable of
      emulating the chosen processor (for example a POWER7 cannot emulate a
      POWER8, similarly with a POWER8 and a POWER9).
      
      Based on work by: Paul Mackerras <paulus@ozlabs.org>
      
      [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set
       guest_pcr_bit when arch_compat == 0, added comment.]
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2ee13be3
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores · 45c940ba
      Paul Mackerras authored
      With POWER9, each CPU thread has its own MMU context and can be
      in the host or a guest independently of the other threads; there is
      still however a restriction that all threads must use the same type
      of address translation, either radix tree or hashed page table (HPT).
      
      Since we only support HPT guests on a HPT host at this point, we
      can treat the threads as being independent, and avoid all of the
      work of coordinating the CPU threads.  To make this simpler, we
      introduce a new threads_per_vcore() function that returns 1 on
      POWER9 and threads_per_subcore on POWER7/8, and use that instead
      of threads_per_subcore or threads_per_core in various places.
      
      This also changes the value of the KVM_CAP_PPC_SMT capability on
      POWER9 systems from 4 to 1, so that userspace will not try to
      create VMs with multiple vcpus per vcore.  (If userspace did create
      a VM that thought it was in an SMT mode, the VM might try to use
      the msgsndp instruction, which will not work as expected.  In
      future it may be possible to trap and emulate msgsndp in order to
      allow VMs to think they are in an SMT mode, if only for the purpose
      of allowing migration from POWER8 systems.)
      
      With all this, we can now run guests on POWER9 as long as the host
      is running with HPT translation.  Since userspace currently has no
      way to request radix tree translation for the guest, the guest has
      no choice but to use HPT translation.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      45c940ba
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guest · 84f7139c
      Paul Mackerras authored
      The new XIVE interrupt controller on POWER9 can direct external
      interrupts to the hypervisor or the guest.  The interrupts directed to
      the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and
      come in as a "hypervisor virtualization interrupt".  This sets the
      LPCR bit so that hypervisor virtualization interrupts can occur while
      we are in the guest.  We then also need to cope with exiting the guest
      because of a hypervisor virtualization interrupt.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      84f7139c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9 · bf53c88e
      Paul Mackerras authored
      POWER9 replaces the various power-saving mode instructions on POWER8
      (doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
      a register, PSSCR, which controls the depth of the power-saving mode.
      This replaces the use of the nap instruction when threads are idle
      during guest execution with the stop instruction, and adds code to
      set PSSCR to a value which will allow an SMT mode switch while the
      thread is idle (given that the core as a whole won't be idle in these
      cases).
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      bf53c88e
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 · f725758b
      Paul Mackerras authored
      POWER9 includes a new interrupt controller, called XIVE, which is
      quite different from the XICS interrupt controller on POWER7 and
      POWER8 machines.  KVM-HV accesses the XICS directly in several places
      in order to send and clear IPIs and handle interrupts from PCI
      devices being passed through to the guest.
      
      In order to make the transition to XIVE easier, OPAL firmware will
      include an emulation of XICS on top of XIVE.  Access to the emulated
      XICS is via OPAL calls.  The one complication is that the EOI
      (end-of-interrupt) function can now return a value indicating that
      another interrupt is pending; in this case, the XIVE will not signal
      an interrupt in hardware to the CPU, and software is supposed to
      acknowledge the new interrupt without waiting for another interrupt
      to be delivered in hardware.
      
      This adapts KVM-HV to use the OPAL calls on machines where there is
      no XICS hardware.  When there is no XICS, we look for a device-tree
      node with "ibm,opal-intc" in its compatible property, which is how
      OPAL indicates that it provides XICS emulation.
      
      In order to handle the EOI return value, kvmppc_read_intr() has
      become kvmppc_read_one_intr(), with a boolean variable passed by
      reference which can be set by the EOI functions to indicate that
      another interrupt is pending.  The new kvmppc_read_intr() keeps
      calling kvmppc_read_one_intr() until there are no more interrupts
      to process.  The return value from kvmppc_read_intr() is the
      largest non-zero value of the returns from kvmppc_read_one_intr().
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f725758b
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9 · 1704a81c
      Paul Mackerras authored
      On POWER9, the msgsnd instruction is able to send interrupts to
      other cores, as well as other threads on the local core.  Since
      msgsnd is generally simpler and faster than sending an IPI via the
      XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1704a81c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras authored
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras authored
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9 · 83677f55
      Paul Mackerras authored
      Some special-purpose registers that were present and accessible
      by guests on POWER8 no longer exist on POWER9, so this adds
      feature sections to ensure that we don't try to context-switch
      them when going into or out of a guest on POWER9.  These are
      all relatively obscure, rarely-used registers, but we had to
      context-switch them on POWER8 to avoid creating a covert channel.
      They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      83677f55
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9 · 7a84084c
      Paul Mackerras authored
      On POWER9, the SDR1 register (hashed page table base address) is no
      longer used, and instead the hardware reads the HPT base address
      and size from the partition table.  The partition table entry also
      contains the bits that specify the page size for the VRMA mapping,
      which were previously in the LPCR.  The VPM0 bit of the LPCR is
      now reserved; the processor now always uses the VRMA (virtual
      real-mode area) mechanism for guest real-mode accesses in HPT mode,
      and the RMO (real-mode offset) mechanism has been dropped.
      
      When entering or exiting the guest, we now only have to set the
      LPIDR (logical partition ID register), not the SDR1 register.
      There is also no requirement now to transition via a reserved
      LPID value.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      7a84084c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9 · abb7c7dd
      Paul Mackerras authored
      This adapts the KVM-HV hashed page table (HPT) code to read and write
      HPT entries in the new format defined in Power ISA v3.00 on POWER9
      machines.  The new format moves the B (segment size) field from the
      first doubleword to the second, and trims some bits from the AVA
      (abbreviated virtual address) and ARPN (abbreviated real page number)
      fields.  As far as possible, the conversion is done when reading or
      writing the HPT entries, and the rest of the code continues to use
      the old format.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      abb7c7dd
    • Paul Mackerras's avatar
      Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · bc33b1fc
      Paul Mackerras authored
      This merges in the ppc-kvm topic branch to get changes to
      arch/powerpc code that are necessary for adding POWER9 KVM support.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      bc33b1fc
    • Michael Neuling's avatar
      powerpc/powernv: Define and set POWER9 HFSCR doorbell bit · 02ed21ae
      Michael Neuling authored
      Define and set the POWER9 HFSCR doorbell bit so that guests can use
      msgsndp.
      
      ISA 3.0 calls this MSGP, so name it accordingly in the code.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      02ed21ae
  8. 22 Nov, 2016 8 commits
    • Michael Ellerman's avatar
      powerpc/reg: Add definition for LPCR_PECE_HVEE · 1f0f2e72
      Michael Ellerman authored
      ISA 3.0 defines a new PECE (Power-saving mode Exit Cause Enable) field
      in the LPCR (Logical Partitioning Control Register), called
      LPCR_PECE_HVEE (Hypervisor Virtualization Exit Enable).
      
      KVM code will need to know about this bit, so add a definition for it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1f0f2e72
    • Suraj Jitindar Singh's avatar
      powerpc/64: Define new ISA v3.00 logical PVR value and PCR register value · 9dd17e85
      Suraj Jitindar Singh authored
      ISA 3.00 adds the logical PVR value 0x0f000005, so add a definition for
      this.
      
      Define PCR_ARCH_207 to reflect ISA 2.07 compatibility mode in the processor
      compatibility register (PCR).
      
      [paulus@ozlabs.org - moved dummy PCR_ARCH_300 value into next patch]
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9dd17e85
    • Paul Mackerras's avatar
      powerpc/powernv: Define real-mode versions of OPAL XICS accessors · ffe6d810
      Paul Mackerras authored
      This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
      and opal_int_set_mfrr(), for use by KVM real-mode code.
      
      It also exports opal_int_set_mfrr() so that the modular part of KVM
      can use it to send IPIs.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ffe6d810
    • Paul Mackerras's avatar
      powerpc/64: Provide functions for accessing POWER9 partition table · 9d661958
      Paul Mackerras authored
      POWER9 requires the host to set up a partition table, which is a
      table in memory indexed by logical partition ID (LPID) which
      contains the pointers to page tables and process tables for the
      host and each guest.
      
      This factors out the initialization of the partition table into
      a single function.  This code was previously duplicated between
      hash_utils_64.c and pgtable-radix.c.
      
      This provides a function for setting a partition table entry,
      which is used in early MMU initialization, and will be used by
      KVM whenever a guest is created.  This function includes a tlbie
      instruction which will flush all TLB entries for the LPID and
      all caches of the partition table entry for the LPID, across the
      system.
      
      This also moves a call to memblock_set_current_limit(), which was
      in radix_init_partition_table(), but has nothing to do with the
      partition table.  By analogy with the similar code for hash, the
      call gets moved to near the end of radix__early_init_mmu().  It
      now gets called when running as a guest, whereas previously it
      would only be called if the kernel is running as the host.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9d661958
    • Christian Borntraeger's avatar
      KVM: s390: handle floating point registers in the run ioctl not in vcpu_put/load · e1788bb9
      Christian Borntraeger authored
      Right now we switch the host fprs/vrs in kvm_arch_vcpu_load and switch
      back in kvm_arch_vcpu_put. This process is already optimized
      since commit 9977e886 ("s390/kernel: lazy restore fpu registers")
      avoiding double save/restores on schedule. We still reload the pointers
      and test the guest fpc on each context switch, though.
      
      We can minimize the cost of vcpu_load/put by doing the test in the
      VCPU_RUN ioctl itself. As most VCPU threads almost never exit to
      userspace in the common fast path, this allows to avoid this overhead
      for the common case (eventfd driven I/O, all exits including sleep
      handled in the kernel) - making kvm_arch_vcpu_load/put basically
      disappear in perf top.
      
      Also adapt the fpu get/set ioctls.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      e1788bb9
    • Christian Borntraeger's avatar
      KVM: s390: handle access registers in the run ioctl not in vcpu_put/load · 31d8b8d4
      Christian Borntraeger authored
      Right now we save the host access registers in kvm_arch_vcpu_load
      and load them in kvm_arch_vcpu_put. Vice versa for the guest access
      registers. On schedule this means, that we load/save access registers
      multiple times.
      
      e.g. VCPU_RUN with just one reschedule and then return does
      
      [from user space via VCPU_RUN]
      - save the host registers in kvm_arch_vcpu_load (via ioctl)
      - load the guest registers in kvm_arch_vcpu_load (via ioctl)
      - do guest stuff
      - decide to schedule/sleep
      - save the guest registers in kvm_arch_vcpu_put (via sched)
      - load the host registers in kvm_arch_vcpu_put (via sched)
      - save the host registers in switch_to (via sched)
      - schedule
      - return
      - load the host registers in switch_to (via sched)
      - save the host registers in kvm_arch_vcpu_load (via sched)
      - load the guest registers in kvm_arch_vcpu_load (via sched)
      - do guest stuff
      - decide to go to userspace
      - save the guest registers in kvm_arch_vcpu_put (via ioctl)
      - load the host registers in kvm_arch_vcpu_put (via ioctl)
      [back to user space]
      
      As the kernel does not use access registers, we can avoid
      this reloading and simply piggy back on switch_to (let it save
      the guest values instead of host values in thread.acrs) by
      moving the host/guest switch into the VCPU_RUN ioctl function.
      We now do
      
      [from user space via VCPU_RUN]
      - save the host registers in kvm_arch_vcpu_ioctl_run
      - load the guest registers in kvm_arch_vcpu_ioctl_run
      - do guest stuff
      - decide to schedule/sleep
      - save the guest registers in switch_to
      - schedule
      - return
      - load the guest registers in switch_to (via sched)
      - do guest stuff
      - decide to go to userspace
      - save the guest registers in kvm_arch_vcpu_ioctl_run
      - load the host registers in kvm_arch_vcpu_ioctl_run
      
      This seems to save about 10% of the vcpu_put/load functions
      according to perf.
      
      As vcpu_load no longer switches the acrs, We can also loading
      the acrs in kvm_arch_vcpu_ioctl_set_sregs.
      Suggested-by: default avatarFan Zhang <zhangfan@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      31d8b8d4
    • Bandan Das's avatar
      kvm: x86: don't print warning messages for unimplemented msrs · ae0f5499
      Bandan Das authored
      Change unimplemented msrs messages to use pr_debug.
      If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
      enabled at run time or else -DDEBUG can be used at compile
      time to enable them. These messages will still be printed if
      ignore_msrs=1.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      ae0f5499
    • Jan Dakinevich's avatar
      KVM: nVMX: invvpid handling improvements · bcdde302
      Jan Dakinevich authored
       - Expose all invalidation types to the L1
      
       - Reject invvpid instruction, if L1 passed zero vpid value to single
         context invalidations
      Signed-off-by: default avatarJan Dakinevich <jan.dakinevich@gmail.com>
      Tested-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      bcdde302