1. 30 Oct, 2012 10 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670
      Paul Mackerras authored
      Currently the code that accounts stolen time tends to overestimate the
      stolen time, and will sometimes report more stolen time in a DTL
      (dispatch trace log) entry than has elapsed since the last DTL entry.
      This can cause guests to underflow the user or system time measured
      for some tasks, leading to ridiculous CPU percentages and total runtimes
      being reported by top and other utilities.
      
      In addition, the current code was designed for the previous policy where
      a vcore would only run when all the vcpus in it were runnable, and so
      only counted stolen time on a per-vcore basis.  Now that a vcore can
      run while some of the vcpus in it are doing other things in the kernel
      (e.g. handling a page fault), we need to count the time when a vcpu task
      is preempted while it is not running as part of a vcore as stolen also.
      
      To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
      vcpu_load/put functions to count preemption time while the vcpu is
      in that state.  Handling the transitions between the RUNNING and
      BUSY_IN_HOST states requires checking and updating two variables
      (accumulated time stolen and time last preempted), so we add a new
      spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
      stolen/preempt-time variables, and the per-vcore variables while this
      vcpu is running the vcore.
      
      Finally, we now don't count time spent in userspace as stolen time.
      The task could be executing in userspace on behalf of the vcpu, or
      it could be preempted, or the vcpu could be genuinely stopped.  Since
      we have no way of dividing up the time between these cases, we don't
      count any of it as stolen.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c7b67670
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run · 8455d79e
      Paul Mackerras authored
      Currently the Book3S HV code implements a policy on multi-threaded
      processors (i.e. POWER7) that requires all of the active vcpus in a
      virtual core to be ready to run before we run the virtual core.
      However, that causes problems on reset, because reset stops all vcpus
      except vcpu 0, and can also reduce throughput since all four threads
      in a virtual core have to wait whenever any one of them hits a
      hypervisor page fault.
      
      This relaxes the policy, allowing the virtual core to run as soon as
      any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
      and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
      KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
      between them.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8455d79e
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fixes for late-joining threads · 2f12f034
      Paul Mackerras authored
      If a thread in a virtual core becomes runnable while other threads
      in the same virtual core are already running in the guest, it is
      possible for the latecomer to join the others on the core without
      first pulling them all out of the guest.  Currently this only happens
      rarely, when a vcpu is first started.  This fixes some bugs and
      omissions in the code in this case.
      
      First, we need to check for VPA updates for the latecomer and make
      a DTL entry for it.  Secondly, if it comes along while the master
      vcpu is doing a VPA update, we don't need to do anything since the
      master will pick it up in kvmppc_run_core.  To handle this correctly
      we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
      a race because we currently clear the hardware thread's hwthread_req
      before waiting to see it get to nap.  A latecomer thread could have
      its hwthread_req cleared before it gets to test it, and therefore
      never increment the nap_count, leading to messages about wait_for_nap
      timeouts.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      2f12f034
    • Paul Mackerras's avatar
      KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock · 913d3ff9
      Paul Mackerras authored
      There were a few places where we were traversing the list of runnable
      threads in a virtual core, i.e. vc->runnable_threads, without holding
      the vcore spinlock.  This extends the places where we hold the vcore
      spinlock to cover everywhere that we traverse that list.
      
      Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
      this moves the call of it from kvmppc_handle_exit out to
      kvmppc_vcpu_run, where we don't hold the vcore lock.
      
      In kvmppc_vcore_blocked, we don't actually need to check whether
      all vcpus are ceded and don't have any pending exceptions, since the
      caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
      actually checking for pending exceptions, so we add that.
      
      The change of if to while in kvmppc_run_vcpu is to make sure that we
      never call kvmppc_remove_runnable() when the vcore state is RUNNING or
      EXITING.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      913d3ff9
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix some races in starting secondary threads · 7b444c67
      Paul Mackerras authored
      Subsequent patches implementing in-kernel XICS emulation will make it
      possible for IPIs to arrive at secondary threads at arbitrary times.
      This fixes some races in how we start the secondary threads, which
      if not fixed could lead to occasional crashes of the host kernel.
      
      This makes sure that (a) we have grabbed all the secondary threads,
      and verified that they are no longer in the kernel, before we start
      any thread, (b) that the secondary thread loads its vcpu pointer
      after clearing the IPI that woke it up (so we don't miss a wakeup),
      and (c) that the secondary thread clears its vcpu pointer before
      incrementing the nap count.  It also removes unnecessary setting
      of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      7b444c67
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online · 512691d4
      Paul Mackerras authored
      When a Book3S HV KVM guest is running, we need the host to be in
      single-thread mode, that is, all of the cores (or at least all of
      the cores where the KVM guest could run) to be running only one
      active hardware thread.  This is because of the hardware restriction
      in POWER processors that all of the hardware threads in the core
      must be in the same logical partition.  Complying with this restriction
      is much easier if, from the host kernel's point of view, only one
      hardware thread is active.
      
      This adds two hooks in the SMP hotplug code to allow the KVM code to
      make sure that secondary threads (i.e. hardware threads other than
      thread 0) cannot come online while any KVM guest exists.  The KVM
      code still has to check that any core where it runs a guest has the
      secondary threads offline, but having done that check it can now be
      sure that they will not come online while the guest is running.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      512691d4
    • Alexander Graf's avatar
      PPC: ePAPR: Convert header to uapi · c99ec973
      Alexander Graf authored
      The new uapi framework splits kernel internal and user space exported
      bits of header files more cleanly. Adjust the ePAPR header accordingly.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c99ec973
    • Alexander Graf's avatar
      KVM: PPC: Move mtspr/mfspr emulation into own functions · 388cf9ee
      Alexander Graf authored
      The mtspr/mfspr emulation code became quite big over time. Move it
      into its own function so things stay more readable.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      388cf9ee
    • Alexander Graf's avatar
      KVM: Documentation: Fix reentry-to-be-consistent paragraph · 686de182
      Alexander Graf authored
      All user space offloaded instruction emulation needs to reenter kvm
      to produce consistent state again. Fix the section in the documentation
      to mention all of them.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      686de182
    • Alexander Graf's avatar
      KVM: PPC: 44x: fix DCR read/write · e43a0287
      Alexander Graf authored
      When remembering the direction of a DCR transaction, we should write
      to the same variable that we interpret on later when doing vcpu_run
      again.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Cc: stable@vger.kernel.org
      e43a0287
  2. 23 Oct, 2012 1 commit
  3. 22 Oct, 2012 1 commit
  4. 18 Oct, 2012 3 commits
  5. 17 Oct, 2012 4 commits
  6. 10 Oct, 2012 5 commits
  7. 08 Oct, 2012 2 commits
  8. 05 Oct, 2012 14 commits
    • Julia Lawall's avatar
      arch/powerpc/kvm/e500_tlb.c: fix error return code · 12ecd957
      Julia Lawall authored
      Convert a 0 error return code to a negative one, as returned elsewhere in the
      function.
      
      A new label is also added to avoid freeing things that are known to not yet
      be allocated.
      
      A simplified version of the semantic match that finds the first problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      identifier ret;
      expression e,e1,e2,e3,e4,x;
      @@
      
      (
      if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
      |
      ret = 0
      )
      ... when != ret = e1
      *x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
      ... when != x = e2
          when != ret = e3
      *if (x == NULL || ...)
      {
        ... when != ret = e4
      *  return ret;
      }
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      12ecd957
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas · 55b665b0
      Paul Mackerras authored
      The PAPR paravirtualization interface lets guests register three
      different types of per-vCPU buffer areas in its memory for communication
      with the hypervisor.  These are called virtual processor areas (VPAs).
      Currently the hypercalls to register and unregister VPAs are handled
      by KVM in the kernel, and userspace has no way to know about or save
      and restore these registrations across a migration.
      
      This adds "register" codes for these three areas that userspace can
      use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have
      been registered, and to register or unregister them.  This will be
      needed for guest hibernation and migration, and is also needed so
      that userspace can unregister them on reset (otherwise we corrupt
      guest memory after reboot by writing to the VPAs registered by the
      previous kernel).
      
      The "register" for the VPA is a 64-bit value containing the address,
      since the length of the VPA is fixed.  The "registers" for the SLB
      shadow buffer and dispatch trace log (DTL) are 128 bits long,
      consisting of the guest physical address in the high (first) 64 bits
      and the length in the low 64 bits.
      
      This also fixes a bug where we were calling init_vpa unconditionally,
      leading to an oops when unregistering the VPA.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      55b665b0
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface · a8bd19ef
      Paul Mackerras authored
      This enables userspace to get and set all the guest floating-point
      state using the KVM_[GS]ET_ONE_REG ioctls.  The floating-point state
      includes all of the traditional floating-point registers and the
      FPSCR (floating point status/control register), all the VMX/Altivec
      vector registers and the VSCR (vector status/control register), and
      on POWER7, the vector-scalar registers (note that each FP register
      is the high-order half of the corresponding VSR).
      
      Most of these are implemented in common Book 3S code, except for VSX
      on POWER7.  Because HV and PR differ in how they store the FP and VSX
      registers on POWER7, the code for these cases is not common.  On POWER7,
      the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
      PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
      arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
      HV KVM on POWER7 stores the whole VSX register in arch.vsr[].
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace, vsx compilation]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a8bd19ef
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface · a136a8bd
      Paul Mackerras authored
      This enables userspace to get and set various SPRs (special-purpose
      registers) using the KVM_[GS]ET_ONE_REG ioctls.  With this, userspace
      can get and set all the SPRs that are part of the guest state, either
      through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or
      the KVM_[GS]ET_ONE_REG ioctls.
      
      The SPRs that are added here are:
      
      - DABR:  Data address breakpoint register
      - DSCR:  Data stream control register
      - PURR:  Processor utilization of resources register
      - SPURR: Scaled PURR
      - DAR:   Data address register
      - DSISR: Data storage interrupt status register
      - AMR:   Authority mask register
      - UAMOR: User authority mask override register
      - MMCR0, MMCR1, MMCRA: Performance monitor unit control registers
      - PMC1..PMC8: Performance monitor unit counter registers
      
      In order to reduce code duplication between PR and HV KVM code, this
      moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and
      centralizes the copying between user and kernel space there.  The
      registers that are handled differently between PR and HV, and those
      that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg()
      functions that are specific to each flavor.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: minimal style fixes]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a136a8bd
    • Scott Wood's avatar
      KVM: PPC: set IN_GUEST_MODE before checking requests · 5bd1cf11
      Scott Wood authored
      Avoid a race as described in the code comment.
      
      Also remove a related smp_wmb() from booke's kvmppc_prepare_to_enter().
      I can't see any reason for it, and the book3s_pr version doesn't have it.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      5bd1cf11
    • Scott Wood's avatar
      KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages · adbb48a8
      Scott Wood authored
      This was found by kmemleak.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      adbb48a8
    • Scott Wood's avatar
      KVM: PPC: e500: fix allocation size error on g2h_tlb1_map · e400e72f
      Scott Wood authored
      We were only allocating half the bytes we need, which was made more
      obvious by a recent fix to the memset in  clear_tlb1_bitmap().
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Cc: stable@vger.kernel.org
      e400e72f
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation · 70bddfef
      Paul Mackerras authored
      In the case where the host kernel is using a 64kB base page size and
      the guest uses a 4k HPTE (hashed page table entry) to map an emulated
      MMIO device, we were calculating the guest physical address wrongly.
      We were calculating a gfn as the guest physical address shifted right
      16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the
      effective address, since the HPTE had a 4k page size.  Thus the gpa
      reported to userspace was missing 4 bits.
      
      Instead, we now compute the guest physical address from the HPTE
      without reference to the host page size, and then compute the gfn
      by shifting the gpa right PAGE_SHIFT bits.
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      70bddfef
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs · 964ee98c
      Paul Mackerras authored
      When making a vcpu non-runnable we incorrectly changed the
      thread IDs of all other threads on the core, just remove that
      code.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      964ee98c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix updates of vcpu->cpu · a47d72f3
      Paul Mackerras authored
      This removes the powerpc "generic" updates of vcpu->cpu in load and
      put, and moves them to the various backends.
      
      The reason is that "HV" KVM does its own sauce with that field
      and the generic updates might corrupt it. The field contains the
      CPU# of the -first- HW CPU of the core always for all the VCPU
      threads of a core (the one that's online from a host Linux
      perspective).
      
      However, the preempt notifiers are going to be called on the
      threads VCPUs when they are running (due to them sleeping on our
      private waitqueue) causing unload to be called, potentially
      clobbering the value.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a47d72f3
    • Paul Mackerras's avatar
      KVM: Move some PPC ioctl definitions to the correct place · ed7a8d7a
      Paul Mackerras authored
      This moves the definitions of KVM_CREATE_SPAPR_TCE and
      KVM_ALLOCATE_RMA in include/linux/kvm.h from the section listing the
      vcpu ioctls to the section listing VM ioctls, as these are both
      implemented and documented as VM ioctls.
      
      Fortunately there is no actual collision of ioctl numbers at this
      point.  Moving these to the correct section will reduce the
      probability of a future collision.  This does not change the
      user/kernel ABI at all.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ed7a8d7a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly · dfe49dbd
      Paul Mackerras authored
      This adds an implementation of kvm_arch_flush_shadow_memslot for
      Book3S HV, and arranges for kvmppc_core_commit_memory_region to
      flush the dirty log when modifying an existing slot.  With this,
      we can handle deletion and modification of memory slots.
      
      kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
      on Book3S HV now traverses the reverse map chains to remove any HPT
      (hashed page table) entries referring to pages in the memslot.  This
      gets called by generic code whenever deleting a memslot or changing
      the guest physical address for a memslot.
      
      We flush the dirty log in kvmppc_core_commit_memory_region for
      consistency with what x86 does.  We only need to flush when an
      existing memslot is being modified, because for a new memslot the
      rmap array (which stores the dirty bits) is all zero, meaning that
      every page is considered clean already, and when deleting a memslot
      we obviously don't care about the dirty bits any more.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      dfe49dbd
    • Paul Mackerras's avatar
      KVM: PPC: Move kvm->arch.slot_phys into memslot.arch · a66b48c3
      Paul Mackerras authored
      Now that we have an architecture-specific field in the kvm_memory_slot
      structure, we can use it to store the array of page physical addresses
      that we need for Book3S HV KVM on PPC970 processors.  This reduces the
      size of struct kvm_arch for Book3S HV, and also reduces the size of
      struct kvm_arch_memory_slot for other PPC KVM variants since the fields
      in it are now only compiled in for Book3S HV.
      
      This necessitates making the kvm_arch_create_memslot and
      kvm_arch_free_memslot operations specific to each PPC KVM variant.
      That in turn means that we now don't allocate the rmap arrays on
      Book3S PR and Book E.
      
      Since we now unpin pages and free the slot_phys array in
      kvmppc_core_free_memslot, we no longer need to do it in
      kvmppc_core_destroy_vm, since the generic code takes care to free
      all the memslots when destroying a VM.
      
      We now need the new memslot to be passed in to
      kvmppc_core_prepare_memory_region, since we need to initialize its
      arch.slot_phys member on Book3S HV.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a66b48c3
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots · 2c9097e4
      Paul Mackerras authored
      The generic KVM code uses SRCU (sleeping RCU) to protect accesses
      to the memslots data structures against updates due to userspace
      adding, modifying or removing memory slots.  We need to do that too,
      both to avoid accessing stale copies of the memslots and to avoid
      lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
      around code that accesses and uses memslots.
      
      Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
      need to access the memslots, and we don't want to call the SRCU code
      in real mode (since we have no assurance that it would only access
      the linear mapping), we hold the SRCU read lock for the VM while
      in the guest.  This does mean that adding or removing memory slots
      while some vcpus are executing in the guest will block for up to
      two jiffies.  This tradeoff is acceptable since adding/removing
      memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
      are performance-critical hot paths.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      2c9097e4