1. 06 Dec, 2012 5 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras authored
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      44e5f6be
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state · 4879f241
      Paul Mackerras authored
      This fixes a bug where adding a new guest HPT entry via the H_ENTER
      hcall would lose the "changed" bit in the reverse map information
      for the guest physical page being mapped.  The result was that the
      KVM_GET_DIRTY_LOG could return a zero bit for the page even though
      the page had been modified by the guest.
      
      This fixes it by only modifying the index and present bits in the
      reverse map entry, thus preserving the reference and change bits.
      We were also unnecessarily setting the reference bit, and this
      fixes that too.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      4879f241
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Restructure HPT entry creation code · 7ed661bf
      Paul Mackerras authored
      This restructures the code that creates HPT (hashed page table)
      entries so that it can be called in situations where we don't have a
      struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
      where kvmppc_map_vrma() would corrupt the guest R4 value.
      
      Most of the work of kvmppc_virtmode_h_enter is now done by a new
      function, kvmppc_virtmode_do_h_enter, which itself calls another new
      function, kvmppc_do_h_enter, which contains most of the old
      kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
      for the place to return the HPTE index, the Linux page tables to use,
      and whether it is being called in real mode, thus removing the need
      for it to have the vcpu as an argument.
      
      Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
      HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
      to handle H_ENTER hcalls from the guest that need to pin a page of
      memory.  Since H_ENTER returns the index of the created HPTE in R4,
      kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
      in the case when it gets called from kvmppc_map_vrma on the first
      VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
      kvmppc_virtmode_do_h_enter with the address of a dummy word as the
      place to store the HPTE index, thus avoiding corrupting the guest R4.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      7ed661bf
    • Alexander Graf's avatar
      KVM: PPC: Support eventfd · 0e673fb6
      Alexander Graf authored
      In order to support the generic eventfd infrastructure on PPC, we need
      to call into the generic KVM in-kernel device mmio code.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      0e673fb6
    • Alexander Graf's avatar
      KVM: Distangle eventfd code from irqchip · 914daba8
      Alexander Graf authored
      The current eventfd code assumes that when we have eventfd, we also have
      irqfd for in-kernel interrupt delivery. This is not necessarily true. On
      PPC we don't have an in-kernel irqchip yet, but we can still support easily
      support eventfd.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      914daba8
  2. 02 Dec, 2012 1 commit
  3. 30 Nov, 2012 3 commits
    • Will Auld's avatar
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld authored
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • Will Auld's avatar
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld authored
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: default avatarWill Auld <will.auld@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
    • Alex Williamson's avatar
      KVM: Fix user memslot overlap check · 5419369e
      Alex Williamson authored
      Prior to memory slot sorting this loop compared all of the user memory
      slots for overlap with new entries.  With memory slot sorting, we're
      just checking some number of entries in the array that may or may not
      be user slots.  Instead, walk all the slots with kvm_for_each_memslot,
      which has the added benefit of terminating early when we hit the first
      empty slot, and skip comparison to private slots.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      5419369e
  4. 29 Nov, 2012 2 commits
  5. 28 Nov, 2012 19 commits
  6. 14 Nov, 2012 3 commits
  7. 01 Nov, 2012 2 commits
    • Marcelo Tosatti's avatar
      Merge branch 'for-queue' of https://github.com/agraf/linux-2.6 into queue · f026399f
      Marcelo Tosatti authored
      * 'for-queue' of https://github.com/agraf/linux-2.6:
        PPC: ePAPR: Convert hcall header to uapi (round 2)
        KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
        KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
        KVM: PPC: Book3S HV: Fix accounting of stolen time
        KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
        KVM: PPC: Book3S HV: Fixes for late-joining threads
        KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
        KVM: PPC: Book3S HV: Fix some races in starting secondary threads
        KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
        PPC: ePAPR: Convert header to uapi
        KVM: PPC: Move mtspr/mfspr emulation into own functions
        KVM: Documentation: Fix reentry-to-be-consistent paragraph
        KVM: PPC: 44x: fix DCR read/write
      f026399f
    • Joerg Roedel's avatar
      KVM: SVM: update MAINTAINERS entry · 7de609c8
      Joerg Roedel authored
      I have no access to my AMD email address anymore. Update
      entry in MAINTAINERS to the new address.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarJoerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      7de609c8
  8. 31 Oct, 2012 2 commits
  9. 30 Oct, 2012 3 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad
      Paul Mackerras authored
      This fixes an error in the inline asm in try_lock_hpte() where we
      were erroneously using a register number as an immediate operand.
      The bug only affects an error path, and in fact the code will still
      work as long as the compiler chooses some register other than r0
      for the "bits" variable.  Nevertheless it should still be fixed.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8b5869ad
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 · 9f8c8c78
      Paul Mackerras authored
      Commit 55b665b0 ("KVM: PPC: Book3S HV: Provide a way for userspace
      to get/set per-vCPU areas") includes a check on the length of the
      dispatch trace log (DTL) to make sure the buffer is at least one entry
      long.  This is appropriate when registering a buffer, but the
      interface also allows for any existing buffer to be unregistered by
      specifying a zero address.  In this case the length check is not
      appropriate.  This makes the check conditional on the address being
      non-zero.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9f8c8c78
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix accounting of stolen time · c7b67670
      Paul Mackerras authored
      Currently the code that accounts stolen time tends to overestimate the
      stolen time, and will sometimes report more stolen time in a DTL
      (dispatch trace log) entry than has elapsed since the last DTL entry.
      This can cause guests to underflow the user or system time measured
      for some tasks, leading to ridiculous CPU percentages and total runtimes
      being reported by top and other utilities.
      
      In addition, the current code was designed for the previous policy where
      a vcore would only run when all the vcpus in it were runnable, and so
      only counted stolen time on a per-vcore basis.  Now that a vcore can
      run while some of the vcpus in it are doing other things in the kernel
      (e.g. handling a page fault), we need to count the time when a vcpu task
      is preempted while it is not running as part of a vcore as stolen also.
      
      To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
      vcpu_load/put functions to count preemption time while the vcpu is
      in that state.  Handling the transitions between the RUNNING and
      BUSY_IN_HOST states requires checking and updating two variables
      (accumulated time stolen and time last preempted), so we add a new
      spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
      stolen/preempt-time variables, and the per-vcore variables while this
      vcpu is running the vcore.
      
      Finally, we now don't count time spent in userspace as stolen time.
      The task could be executing in userspace on behalf of the vcpu, or
      it could be preempted, or the vcpu could be genuinely stopped.  Since
      we have no way of dividing up the time between these cases, we don't
      count any of it as stolen.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c7b67670