1. 03 Mar, 2016 10 commits
  2. 01 Mar, 2016 4 commits
  3. 29 Feb, 2016 9 commits
    • Suresh E. Warrier's avatar
      KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection · 520fe9c6
      Suresh E. Warrier authored
      Redirecting the wakeup of a VCPU from the H_IPI hypercall to
      a core running in the host is usually a good idea, most workloads
      seemed to benefit. However, in one heavily interrupt-driven SMT1
      workload, some regression was observed. This patch adds a kvm_hv
      module parameter called h_ipi_redirect to control this feature.
      
      The default value for this tunable is 1 - that is enable the feature.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      520fe9c6
    • Suresh E. Warrier's avatar
      KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU · e17769eb
      Suresh E. Warrier authored
      This patch adds support to real-mode KVM to search for a core
      running in the host partition and send it an IPI message with
      VCPU to be woken. This avoids having to switch to the host
      partition to complete an H_IPI hypercall when the VCPU which
      is the target of the the H_IPI is not loaded (is not running
      in the guest).
      
      The patch also includes the support in the IPI handler running
      in the host to do the wakeup by calling kvmppc_xics_ipi_action
      for the PPC_MSG_RM_HOST_ACTION message.
      
      When a guest is being destroyed, we need to ensure that there
      are no pending IPIs waiting to wake up a VCPU before we free
      the VCPUs of the guest. This is accomplished by:
      - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
        before freeing any VCPUs in kvm_arch_destroy_vm().
      - Any PPC_MSG_RM_HOST_ACTION messages must be executed first
        before any other PPC_MSG_CALL_FUNCTION messages.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      e17769eb
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM · 0c2a6606
      Suresh Warrier authored
      This patch adds the support for the kick VCPU operation for
      kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
      provides the function to be invoked for a host side operation
      when poked by the real mode KVM. This is initiated by KVM by
      sending an IPI to any free host core.
      
      KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
      rm_data to point to the VCPU to be woken up before sending the IPI.
      Note that we have allocated one kvmppc_host_rm_core structure
      per core. The above values need to be set in the structure
      corresponding to the core to which the IPI will be sent.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      0c2a6606
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs · 6f3bb809
      Suresh Warrier authored
      The kvmppc_host_rm_ops structure keeps track of which cores are
      are in the host by maintaining a bitmask of active/runnable
      online CPUs that have not entered the guest. This patch adds
      support to manage the bitmask when a CPU is offlined or onlined
      in the host.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      6f3bb809
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Manage core host state · b8e6a87c
      Suresh Warrier authored
      Update the core host state in kvmppc_host_rm_ops whenever
      the primary thread of the core enters the guest or returns
      back.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      b8e6a87c
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Host-side RM data structures · 79b6c247
      Suresh Warrier authored
      This patch defines the data structures to support the setting up
      of host side operations while running in real mode in the guest,
      and also the functions to allocate and free it.
      
      The operations are for now limited to virtual XICS operations.
      Currently, we have only defined one operation in the data
      structure:
               - Wake up a VCPU sleeping in the host when it
                 receives a virtual interrupt
      
      The operations are assigned at the core level because PowerKVM
      requires that the host run in SMT off mode. For each core,
      we will need to manage its state atomically - where the state
      is defined by:
      1. Is the core running in the host?
      2. Is there a Real Mode (RM) operation pending on the host?
      
      Currently, core state is only managed at the whole-core level
      even when the system is in split-core mode. This just limits
      the number of free or "available" cores in the host to perform
      any host-side operations.
      
      The kvmppc_host_rm_core.rm_data allows any data to be passed by
      KVM in real mode to the host core along with the operation to
      be performed.
      
      The kvmppc_host_rm_ops structure is allocated the very first time
      a guest VM is started. Initial core state is also set - all online
      cores are in the host. This structure is never deleted, not even
      when there are no active guests. However, it needs to be freed
      when the module is unloaded because the kvmppc_host_rm_ops_hv
      can contain function pointers to kvm-hv.ko functions for the
      different supported host operations.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      79b6c247
    • Suresh Warrier's avatar
      powerpc/xics: Add icp_native_cause_ipi_rm · ec13e9b6
      Suresh Warrier authored
      Function to cause an IPI by directly updating the MFFR register
      in the XICS. The function is meant for real-mode callers since
      they cannot use the smp_ops->cause_ipi function which uses an
      ioremapped address.
      
      Normal usage is for the the KVM real mode code to set the IPI message
      using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm
      to cause the actual IPI.
      
      The function requires kvm_hstate.xics_phys to have been initialized
      with the physical address of XICS.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      ec13e9b6
    • Suresh Warrier's avatar
      powerpc/smp: Add smp_muxed_ipi_set_message · 31639c77
      Suresh Warrier authored
      smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which
      uses an ioremapped address to access registers on the XICS
      interrupt controller to cause the IPI. Because of this real
      mode callers cannot call smp_muxed_ipi_message_pass() for IPI
      messaging.
      
      This patch creates a separate function smp_muxed_ipi_set_message
      just to set the IPI message without the cause_ipi routine.
      After calling this function to set the IPI message, real
      mode callers must cause the IPI by writing to the XICS registers
      directly.
      
      As part of this, we also change smp_muxed_ipi_message_pass
      to call smp_muxed_ipi_set_message to set the message instead
      of doing it directly inside the routine.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      31639c77
    • Suresh Warrier's avatar
      powerpc/smp: Support more IPI messages · bd7f561f
      Suresh Warrier authored
      This patch increases the number of demuxed messages for a
      controller with a single ipi to 8 for 64-bit systems.
      
      This is required because we want to use the IPI mechanism
      to send messages from a CPU running in KVM real mode in a
      guest to a CPU in the host to take some action. Currently,
      we only support 4 messages and all 4 are already taken.
      
      Define a fifth message PPC_MSG_RM_HOST_ACTION for this
      purpose.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      bd7f561f
  4. 23 Feb, 2016 6 commits
  5. 16 Feb, 2016 11 commits
    • Paolo Bonzini's avatar
      KVM: x86: pass kvm_get_time_scale arguments in hertz · 3ae13faa
      Paolo Bonzini authored
      Prepare for improving the precision in the next patch.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3ae13faa
    • Andrey Smetanin's avatar
      kvm/x86: Hyper-V VMBus hypercall userspace exit · 83326e43
      Andrey Smetanin authored
      The patch implements KVM_EXIT_HYPERV userspace exit
      functionality for Hyper-V VMBus hypercalls:
      HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.
      
      Changes v3:
      * use vcpu->arch.complete_userspace_io to setup hypercall
      result
      
      Changes v2:
      * use KVM_EXIT_HYPERV for hypercalls
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      83326e43
    • Andrey Smetanin's avatar
      kvm/x86: Reject Hyper-V hypercall continuation · b2fdc257
      Andrey Smetanin authored
      Currently we do not support Hyper-V hypercall continuation
      so reject it.
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b2fdc257
    • Andrey Smetanin's avatar
      kvm/x86: Pass return code of kvm_emulate_hypercall · 0d9c055e
      Andrey Smetanin authored
      Pass the return code from kvm_emulate_hypercall on to the caller,
      in order to allow it to indicate to the userspace that
      the hypercall has to be handled there.
      
      Also adjust all the existing code paths to return 1 to make sure the
      hypercall isn't passed to the userspace without setting kvm_run
      appropriately.
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0d9c055e
    • Andrey Smetanin's avatar
      drivers/hv: Move VMBus hypercall codes into Hyper-V UAPI header · 18f09861
      Andrey Smetanin authored
      VMBus hypercall codes inside Hyper-V UAPI header will
      be used by QEMU to implement VMBus host devices support.
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      [Do not rename the constant at the same time as moving it, as that
       would cause semantic conflicts with the Hyper-V tree. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18f09861
    • Andrey Smetanin's avatar
      kvm/x86: Rename Hyper-V long spin wait hypercall · 8ed6d767
      Andrey Smetanin authored
      Rename HV_X64_HV_NOTIFY_LONG_SPIN_WAIT by HVCALL_NOTIFY_LONG_SPIN_WAIT,
      so the name is more consistent with the other hypercalls.
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      [Change name, Andrey used HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8ed6d767
    • Paolo Bonzini's avatar
      KVM: x86: fix missed hardware breakpoints · 4e422bdd
      Paolo Bonzini authored
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in both the host and the guest, for example by running "./bp
      0 1" on the host and "./bp 14 15" under QEMU.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarNadadv Amit <namit@cs.technion.ac.il>
      Reported-by: default avatarAndrey Wagin <avagin@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4e422bdd
    • Radim Krčmář's avatar
      KVM: x86: fix *NULL on invalid low-prio irq · 4efd805f
      Radim Krčmář authored
      Smatch noticed a NULL dereference in kvm_intr_is_single_vcpu_fast that
      happens if VM already warned about invalid lowest-priority interrupt.
      
      Create a function for common code while fixing it.
      
      Fixes: 6228a0da ("KVM: x86: Add lowest-priority support for vt-d posted-interrupts")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4efd805f
    • Paolo Bonzini's avatar
      KVM: x86: rewrite handling of scaled TSC for kvmclock · 78db6a50
      Paolo Bonzini authored
      This is the same as before:
      
          kvm_scale_tsc(tgt_tsc_khz)
              = tgt_tsc_khz * ratio
              = tgt_tsc_khz * user_tsc_khz / tsc_khz   (see set_tsc_khz)
              = user_tsc_khz                           (see kvm_guest_time_update)
              = vcpu->arch.virtual_tsc_khz             (see kvm_set_tsc_khz)
      
      However, computing it through kvm_scale_tsc will make it possible
      to include the NTP correction in tgt_tsc_khz.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      78db6a50
    • Paolo Bonzini's avatar
      KVM: x86: rename argument to kvm_set_tsc_khz · 4941b8cb
      Paolo Bonzini authored
      This refers to the desired (scaled) frequency, which is called
      user_tsc_khz in the rest of the file.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4941b8cb
    • Jan Kiszka's avatar
      KVM: VMX: Fix guest debugging while in L2 · 6f05485d
      Jan Kiszka authored
      When we take a #DB or #BP vmexit while in guest mode, we first of all
      need to check if there is ongoing guest debugging that might be
      interested in the event. Currently, we unconditionally leave L2 and
      inject the event into L1 if it is intercepting the exceptions. That
      breaks things marvelously.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6f05485d