1. 15 Mar, 2019 5 commits
    • Paolo Bonzini's avatar
      kvm: vmx: fix formatting of a comment · 4a605bc0
      Paolo Bonzini authored
      Eliminate a gratuitous conflict with 5.0.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4a605bc0
    • Sean Christopherson's avatar
      KVM: doc: Document the life cycle of a VM and its resources · eca6be56
      Sean Christopherson authored
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eca6be56
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-next-5.1-3' of... · c7a0e83c
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-next-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      Third PPC KVM update for 5.1
      
      - Tell userspace about whether a particular hardware workaround for
        one of the Spectre vulnerabilities is available, so that userspace
        can inform the guest.
      c7a0e83c
    • Sean Christopherson's avatar
      MAINTAINERS: Add KVM selftests to existing KVM entry · 46333236
      Sean Christopherson authored
      It's safe to assume Paolo and Radim are maintaining the KVM selftests
      given that the vast majority of commits have their SOBs.  Play nice
      with get_maintainers and make it official.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      46333236
    • Ben Gardon's avatar
      Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" · 92da008f
      Ben Gardon authored
      This reverts commit 71883a62.
      
      The above commit contains an optimization to kvm_zap_gfn_range which
      uses gfn-limited TLB flushes, if enabled. If using these limited flushes,
      kvm_zap_gfn_range passes lock_flush_tlb=false to slot_handle_level_range
      which creates a race when the function unlocks to call cond_resched.
      See an example of this race below:
      
      CPU 0                   CPU 1                           CPU 3
      // zap_direct_gfn_range
      mmu_lock()
      // *ptep == pte_1
      *ptep = 0
      if (lock_flush_tlb)
              flush_tlbs()
      mmu_unlock()
                              // In invalidate range
                              // MMU notifier
                              mmu_lock()
                              if (pte != 0)
                                      *ptep = 0
                                      flush = true
                              if (flush)
                                      flush_remote_tlbs()
                              mmu_unlock()
                              return
                              // Host MM reallocates
                              // page previously
                              // backing guest memory.
                                                              // Guest accesses
                                                              // invalid page
                                                              // through pte_1
                                                              // in its TLB!!
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      92da008f
  2. 01 Mar, 2019 1 commit
  3. 26 Feb, 2019 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Fix compilation when KVM is not enabled · e74d53e3
      Paul Mackerras authored
      Compiling with CONFIG_PPC_POWERNV=y and KVM disabled currently gives
      an error like this:
      
        CC      arch/powerpc/kernel/dbell.o
      In file included from arch/powerpc/kernel/dbell.c:20:0:
      arch/powerpc/include/asm/kvm_ppc.h: In function ‘xics_on_xive’:
      arch/powerpc/include/asm/kvm_ppc.h:625:9: error: implicit declaration of function ‘xive_enabled’ [-Werror=implicit-function-declaration]
        return xive_enabled() && cpu_has_feature(CPU_FTR_HVMODE);
               ^
      cc1: all warnings being treated as errors
      scripts/Makefile.build:276: recipe for target 'arch/powerpc/kernel/dbell.o' failed
      make[3]: *** [arch/powerpc/kernel/dbell.o] Error 1
      
      Fix this by making the xics_on_xive() definition conditional on the
      same symbol (CONFIG_KVM_BOOK3S_64_HANDLER) that determines whether we
      include <asm/xive.h> or not, since that's the header that defines
      xive_enabled().
      
      Fixes: 03f95332 ("KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e74d53e3
  4. 22 Feb, 2019 12 commits
  5. 21 Feb, 2019 3 commits
    • Paul Mackerras's avatar
      powerpc/64s: Better printing of machine check info for guest MCEs · c0577201
      Paul Mackerras authored
      This adds an "in_guest" parameter to machine_check_print_event_info()
      so that we can avoid trying to translate guest NIP values into
      symbolic form using the host kernel's symbol table.
      Reviewed-by: default avatarAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c0577201
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Simplify machine check handling · 884dfb72
      Paul Mackerras authored
      This makes the handling of machine check interrupts that occur inside
      a guest simpler and more robust, with less done in assembler code and
      in real mode.
      
      Now, when a machine check occurs inside a guest, we always get the
      machine check event struct and put a copy in the vcpu struct for the
      vcpu where the machine check occurred.  We no longer call
      machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
      on POWER8, when a vcpu is running on an offline secondary thread and
      we call machine_check_queue_event(), that calls irq_work_queue(),
      which doesn't work because the CPU is offline, but instead triggers
      the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
      fires again and again because nothing clears the condition).
      
      All that machine_check_queue_event() actually does is to cause the
      event to be printed to the console.  For a machine check occurring in
      the guest, we now print the event in kvmppc_handle_exit_hv()
      instead.
      
      The assembly code at label machine_check_realmode now just calls C
      code and then continues exiting the guest.  We no longer either
      synthesize a machine check for the guest in assembly code or return
      to the guest without a machine check.
      
      The code in kvmppc_handle_exit_hv() is extended to handle the case
      where the guest is not FWNMI-capable.  In that case we now always
      synthesize a machine check interrupt for the guest.  Previously, if
      the host thinks it has recovered the machine check fully, it would
      return to the guest without any notification that the machine check
      had occurred.  If the machine check was caused by some action of the
      guest (such as creating duplicate SLB entries), it is much better to
      tell the guest that it has caused a problem.  Therefore we now always
      generate a machine check interrupt for guests that are not
      FWNMI-capable.
      Reviewed-by: default avatarAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      884dfb72
    • Michael Ellerman's avatar
      KVM: PPC: Book3S HV: Context switch AMR on Power9 · d976f680
      Michael Ellerman authored
      kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9
      when guest and host are both running with the Radix MMU.
      
      Currently in that path we don't save the host AMR (Authority Mask
      Register) value, and we always restore 0 on return to the host. That
      is OK at the moment because the AMR is not used for storage keys with
      the Radix MMU.
      
      However we plan to start using the AMR on Radix to prevent the kernel
      from reading/writing to userspace outside of copy_to/from_user(). In
      order to make that work we need to save/restore the AMR value.
      
      We only restore the value if it is different from the guest value,
      which is already in the register when we exit to the host. This should
      mean we rarely need to actually restore the value when running a
      modern Linux as a guest, because it will be using the same value as
      us.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarRussell Currey <ruscur@russell.cc>
      d976f680
  6. 20 Feb, 2019 18 commits