1. 30 May, 2014 20 commits
    • James Hogan's avatar
      MIPS: KVM: Rewrite count/compare timer emulation · e30492bb
      James Hogan authored
      Previously the emulation of the CPU timer was just enough to get a Linux
      guest running but some shortcuts were taken:
       - The guest timer interrupt was hard coded to always happen every 10 ms
         rather than being timed to when CP0_Count would match CP0_Compare.
       - The guest's CP0_Count register was based on the host's CP0_Count
         register. This isn't very portable and fails on cores without a
         CP_Count register implemented such as Ingenic XBurst. It also meant
         that the guest's CP0_Cause.DC bit to disable the CP0_Count register
         took no effect.
       - The guest's CP0_Count register was emulated by just dividing the
         host's CP0_Count register by 4. This resulted in continuity problems
         when used as a clock source, since when the host CP0_Count overflows
         from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
         discontinuously from 0x1fffffff to 0xe0000000.
      
      Therefore rewrite & fix emulation of the guest timer based on the
      monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
      value is added to the frequency scaled nanosecond monotonic time to get
      the guest's CP0_Count. The frequency of the timer is initialised to
      100MHz and cannot yet be changed, but a later patch will allow the
      frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
      interface.
      
      The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
      via the KVM_SET_ONE_REG ioctl interface), at which point the current
      CP0_Count is stored and can be read directly. When it is restarted the
      bias is recalculated such that the CP0_Count value is continuous.
      
      Due to the nature of hrtimer interrupts any read of the guest's
      CP0_Count register while it is running triggers a check for whether the
      hrtimer has expired, so that the guest/userland cannot observe the
      CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
      also taken advantage of when stopping the timer to ensure that a pending
      timer interrupt is queued.
      
      This replaces the implementation of:
       - Guest read of CP0_Count
       - Guest write of CP0_Count
       - Guest write of CP0_Compare
       - Guest write of CP0_Cause
       - Guest read of HWR 2 (CC) with RDHWR
       - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
       - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e30492bb
    • James Hogan's avatar
      MIPS: KVM: Migrate hrtimer to follow VCPU · 3a0ba774
      James Hogan authored
      When a VCPU is scheduled in on a different CPU, refresh the hrtimer used
      for emulating count/compare so that it gets migrated to the same CPU.
      
      This should prevent a timer interrupt occurring on a different CPU to
      where the guest it relates to is running, which would cause the guest
      timer interrupt not to be delivered until after the next guest exit.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3a0ba774
    • James Hogan's avatar
      MIPS: KVM: Fix timer race modifying guest CP0_Cause · c73c99b0
      James Hogan authored
      The hrtimer callback for guest timer timeouts sets the guest's
      CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
      pending, however there is no mutual exclusion implemented to prevent
      this occurring while the guest's CP0_Cause register is being
      read-modify-written elsewhere.
      
      When this occurs the setting of the CP0_Cause.TI bit is undone and the
      guest misses the timer interrupt and doesn't reprogram the CP0_Compare
      register for the next timeout. Currently another timer interrupt will be
      triggered again in another 10ms anyway due to the way timers are
      emulated, but after the MIPS timer emulation is fixed this would result
      in Linux guest time standing still and the guest scheduler not being
      invoked until the guest CP0_Count has looped around again, which at
      100MHz takes just under 43 seconds.
      
      Currently this is the only asynchronous modification of guest registers,
      therefore it is fixed by adjusting the implementations of the
      kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
      kvm_change_c0_guest_cause() macros which are used for modifying the
      guest CP0_Cause register to use ll/sc to ensure atomic modification.
      This should work in both UP and SMP cases without requiring interrupts
      to be disabled.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c73c99b0
    • James Hogan's avatar
      MIPS: KVM: Deliver guest interrupts after local_irq_disable() · 044f0f03
      James Hogan authored
      When about to run the guest, deliver guest interrupts after disabling
      host interrupts. This should prevent an hrtimer interrupt from being
      handled after delivering guest interrupts, and therefore not delivering
      the guest timer interrupt until after the next guest exit.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      044f0f03
    • James Hogan's avatar
      MIPS: KVM: Add CP0_HWREna KVM register access · 16fd5c1d
      James Hogan authored
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      HWREna register. This is so that userland can save and restore its
      value so that RDHWR instructions don't have to be emulated by the guest.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      16fd5c1d
    • James Hogan's avatar
      MIPS: KVM: Add CP0_UserLocal KVM register access · 7767b7d2
      James Hogan authored
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      UserLocal register. This is so that userland can save and restore its
      value.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7767b7d2
    • James Hogan's avatar
      MIPS: KVM: Add CP0_Count/Compare KVM register access · f8be02da
      James Hogan authored
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      Count and Compare registers. These registers are special in that writing
      to them has side effects (adjusting the time until the next timer
      interrupt) and reading of Count depends on the time. Therefore add a
      couple of callbacks so that different implementations (trap & emulate or
      VZ) can implement them differently depending on what the hardware
      provides.
      
      The trap & emulate versions mostly duplicate what happens when a T&E
      guest reads or writes these registers, so it inherits the same
      limitations which can be fixed in later patches.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f8be02da
    • James Hogan's avatar
      MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.h · 48a3c4e4
      James Hogan authored
      Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
      kvm_mips.c to kvm_host.h so that they can be shared between multiple
      source files. This allows register access to be indirected depending on
      the underlying implementation (trap & emulate or VZ).
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      48a3c4e4
    • James Hogan's avatar
      MIPS: KVM: Add CP0_EPC KVM register access · fb6df0cd
      James Hogan authored
      Contrary to the comment, the guest CP0_EPC register cannot be set via
      kvm_regs, since it is distinct from the guest PC. Add the EPC register
      to the KVM_{GET,SET}_ONE_REG ioctl interface.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb6df0cd
    • James Hogan's avatar
      MIPS: KVM: Use tlb_write_random · b5dfc6c1
      James Hogan authored
      When MIPS KVM needs to write a TLB entry for the guest it reads the
      CP0_Random register, uses it to generate the CP_Index, and writes the
      TLB entry using the TLBWI instruction (tlb_write_indexed()).
      
      However there's an instruction for that, TLBWR (tlb_write_random()) so
      use that instead.
      
      This happens to also fix an issue with Ingenic XBurst cores where the
      same TLB entry is replaced each time preventing forward progress on
      stores due to alternating between TLB load misses for the instruction
      fetch and TLB store misses.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5dfc6c1
    • James Hogan's avatar
      MIPS: KVM: Use local_flush_icache_range to fix RI on XBurst · facaaec1
      James Hogan authored
      MIPS KVM uses mips32_SyncICache to synchronise the icache with the
      dcache after dynamically modifying guest instructions or writing guest
      exception vector. However this uses rdhwr to get the SYNCI step, which
      causes a reserved instruction exception on Ingenic XBurst cores.
      
      It would seem to make more sense to use local_flush_icache_range()
      instead which does the same thing but is more portable.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      facaaec1
    • James Hogan's avatar
      MIPS: Export local_flush_icache_range for KVM · 90f91356
      James Hogan authored
      Export the local_flush_icache_range function pointer for GPL modules so
      that it can be used by KVM for syncing the icache after binary
      translation of trapping instructions.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      90f91356
    • James Hogan's avatar
      MIPS: KVM: Allocate at least 16KB for exception handlers · 7006e2df
      James Hogan authored
      Each MIPS KVM guest has its own copy of the KVM exception vector. This
      contains the TLB refill exception handler at offset 0x000, the general
      exception handler at offset 0x180, and interrupt exception handlers at
      offset 0x200 in case Cause_IV=1. A common handler is copied to offset
      0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
      from guest.
      
      However the amount of memory allocated for this purpose is calculated as
      0x200 rounded up to the next page boundary, which is insufficient if 4KB
      pages are in use. This can lead to the common handler at offset 0x2000
      being overwritten and infinitely recursive exceptions on the next exit
      from the guest.
      
      Increase the minimum size from 0x200 to 0x4000 to cover the full use of
      the page.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7006e2df
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-20140530' of... · 146b2cfe
      Paolo Bonzini authored
      Merge tag 'kvm-s390-20140530' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
      
      1. Several minor fixes and cleanups for KVM:
      2. Fix flag check for gdb support
      3. Remove unnecessary vcpu start
      4. Remove code duplication for sigp interrupts
      5. Better DAT handling for the TPROT instruction
      6. Correct addressing exception for standby memory
      146b2cfe
    • Matthew Rosato's avatar
      KVM: s390: Intercept the tprot instruction · 5a5e6536
      Matthew Rosato authored
      Based on original patch from Jeng-fang (Nick) Wang
      
      When standby memory is specified for a guest Linux, but no virtual memory has
      been allocated on the Qemu host backing that guest, the guest memory detection
      process encounters a memory access exception which is not thrown from the KVM
      handle_tprot() instruction-handler function. The access exception comes from
      sie64a returning EFAULT, which then passes an addressing exception to the guest.
      Unfortunately this does not the proper PSW fixup (nullifying vs.
      suppressing) so the guest will get a fault for the wrong address.
      
      Let's just intercept the tprot instruction all the time to do the right thing
      and not go the page fault handler path for standby memory. tprot is only used
      by Linux during startup so some exits should be ok.
      Without this patch, standby memory cannot be used with KVM.
      Signed-off-by: default avatarNick Wang <jfwang@us.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      5a5e6536
    • David Hildenbrand's avatar
      KVM: s390: a VCPU is already started when delivering interrupts · 3192c639
      David Hildenbrand authored
      This patch removes the start of a VCPU when delivering a RESTART interrupt.
      Interrupt delivery is called from kvm_arch_vcpu_ioctl_run. So the VCPU is
      already considered started - no need to call kvm_s390_vcpu_start. This function
      will early exit anyway.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      3192c639
    • David Hildenbrand's avatar
      KVM: s390: check the given debug flags, not the set ones · 2de3bfc2
      David Hildenbrand authored
      This patch fixes a minor bug when updating the guest debug settings.
      We should check the given debug flags, not the already set ones.
      Doesn't do any harm but too many (for now unused) flags could be set internally
      without error.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      2de3bfc2
    • Jens Freimann's avatar
      KVM: s390: clean up interrupt injection in sigp code · 22ff4a33
      Jens Freimann authored
      We have all the logic to inject interrupts available in
      kvm_s390_inject_vcpu(), so let's use it instead of
      injecting irqs manually to the list in sigp code.
      
      SIGP stop is special because we have to check the
      action_flags before injecting the interrupt. As
      the action_flags are not available in kvm_s390_inject_vcpu()
      we leave the code for the stop order code untouched for now.
      Signed-off-by: default avatarJens Freimann <jfrei@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      22ff4a33
    • Thomas Huth's avatar
      KVM: s390: Enable DAT support for TPROT handler · a0465f9a
      Thomas Huth authored
      The TPROT instruction can be used to check the accessability of storage
      for any kind of logical addresses. So far, our handler only supported
      real addresses. This patch now also enables support for addresses that
      have to be translated via DAT first. And while we're at it, change the
      code to use the common KVM function gfn_to_hva_prot() to check for the
      validity and writability of the memory page.
      Signed-off-by: default avatarThomas Huth <thuth@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      a0465f9a
    • Thomas Huth's avatar
      KVM: s390: Add a generic function for translating guest addresses · 9fbc0276
      Thomas Huth authored
      This patch adds a function for translating logical guest addresses into
      physical guest addresses without touching the memory at the given location.
      Signed-off-by: default avatarThomas Huth <thuth@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      9fbc0276
  2. 29 May, 2014 1 commit
  3. 27 May, 2014 4 commits
    • Christoffer Dall's avatar
      arm: Fix compile warning for psci · d6d7a95c
      Christoffer Dall authored
      Commit e71246a2 changes psci_init from a
      function returning a void to an int, but does not change the non
      CONFIG_ARM_PSCI implementation to return a value, which causes a compile
      warning.  Just return 0.
      
      Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
      Cc: Shawn Guo <shawn.guo@freescale.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d6d7a95c
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-for-3.16' of... · 04092204
      Paolo Bonzini authored
      Merge tag 'kvm-arm-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
      
      Changed for the 3.16 merge window.
      
      This includes KVM support for PSCI v0.2 and also includes generic Linux
      support for PSCI v0.2 (on hosts that advertise that feature via their
      DT), since the latter depends on headers introduced by the former.
      
      Finally there's a small patch from Marc that enables Cortex-A53 support.
      04092204
    • Nadav Amit's avatar
      KVM: x86: MOV CR/DR emulation should ignore mod · 9b88ae99
      Nadav Amit authored
      MOV CR/DR instructions ignore the mod field (in the ModR/M byte). As the SDM
      states: "The 2 bits in the mod field are ignored".  Accordingly, the second
      operand of these instructions is always a general purpose register.
      
      The current emulator implementation does not do so. If the mod bits do not
      equal 3, it expects the second operand to be in memory.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9b88ae99
    • Paolo Bonzini's avatar
      KVM: lapic: sync highest ISR to hardware apic on EOI · fc57ac2c
      Paolo Bonzini authored
      When Hyper-V enlightenments are in effect, Windows prefers to issue an
      Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
      The Hyper-V MSR write is not handled by the processor, and besides
      being slower, this also causes bugs with APIC virtualization.  The
      reason is that on EOI the processor will modify the highest in-service
      interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
      the SDM; every other step in EOI virtualization is already done by
      apic_send_eoi or on VM entry, but this one is missing.
      
      We need to do the same, and be careful not to muck with the isr_count
      and highest_isr_cache fields that are unused when virtual interrupt
      delivery is enabled.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc57ac2c
  4. 25 May, 2014 1 commit
  5. 22 May, 2014 6 commits
    • Nadav Amit's avatar
      KVM: vmx: DR7 masking on task switch emulation is wrong · 1f854112
      Nadav Amit authored
      The DR7 masking which is done on task switch emulation should be in hex format
      (clearing the local breakpoints enable bits 0,2,4 and 6).
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1f854112
    • Dave Hansen's avatar
      x86: fix page fault tracing when KVM guest support enabled · 65a7f03f
      Dave Hansen authored
      I noticed on some of my systems that page fault tracing doesn't
      work:
      
      	cd /sys/kernel/debug/tracing
      	echo 1 > events/exceptions/enable
      	cat trace;
      	# nothing shows up
      
      I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
      KVM VM, enabling that option breaks page fault tracing, and
      disabling fixes it.  I tried on some old kernels and this does
      not appear to be a regression: it never worked.
      
      There are two page-fault entry functions today.  One when tracing
      is on and another when it is off.  The KVM code calls do_page_fault()
      directly instead of calling the traced version:
      
      > dotraplinkage void __kprobes
      > do_async_page_fault(struct pt_regs *regs, unsigned long
      > error_code)
      > {
      >         enum ctx_state prev_state;
      >
      >         switch (kvm_read_and_reset_pf_reason()) {
      >         default:
      >                 do_page_fault(regs, error_code);
      >                 break;
      >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
      
      I'm also having problems with the page fault tracing on bare
      metal (same symptom of no trace output).  I'm unsure if it's
      related.
      
      Steven had an alternative to this which has zero overhead when
      tracing is off where this includes the standard noops even when
      tracing is disabled.  I'm unconvinced that the extra complexity
      of his apporach:
      
      	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
      
      is worth it, expecially considering that the KVM code is already
      making page fault entry slower here.  This solution is
      dirt-simple.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      65a7f03f
    • Paolo Bonzini's avatar
      KVM: x86: get CPL from SS.DPL · ae9fedc7
      Paolo Bonzini authored
      CS.RPL is not equal to the CPL in the few instructions between
      setting CR0.PE and reloading CS.  And CS.DPL is also not equal
      to the CPL for conforming code segments.
      
      However, SS.DPL *is* always equal to the CPL except for the weird
      case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
      value in the STAR MSR, but force CPL=3 (Intel instead forces
      SS.DPL=SS.RPL=CPL=3).
      
      So this patch:
      
      - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
      the above case with SYSRET is not broken further, and the way
      to fix it would be to pass the CPL to userspace and back
      
      - modifies VMX to always return the CPL from SS.DPL (except
      forcing it to 0 if we are emulating real mode via vm86 mode;
      in vm86 mode all DPLs have to be 3, but real mode does allow
      privileged instructions).  It also removes the CPL cache,
      which becomes a duplicate of the SS access rights cache.
      
      This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
      CR0.PE=1 but before CS has been reloaded.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae9fedc7
    • Paolo Bonzini's avatar
      KVM: x86: check CS.DPL against RPL during task switch · 5045b468
      Paolo Bonzini authored
      Table 7-1 of the SDM mentions a check that the code segment's
      DPL must match the selector's RPL.  This was not done by KVM,
      fix it.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5045b468
    • Paolo Bonzini's avatar
      KVM: x86: drop set_rflags callback · fb5e336b
      Paolo Bonzini authored
      Not needed anymore now that the CPL is computed directly
      during task switch.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb5e336b
    • Paolo Bonzini's avatar
      KVM: x86: use new CS.RPL as CPL during task switch · 2356aaeb
      Paolo Bonzini authored
      During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
      to all the other requirements) and will be the new CPL.  So far this
      worked by carefully setting the CS selector and flag before doing the
      task switch; setting CS.selector will already change the CPL.
      
      However, this will not work once we get the CPL from SS.DPL, because
      then you will have to set the full segment descriptor cache to change
      the CPL.  ctxt->ops->cpl(ctxt) will then return the old CPL during the
      task switch, and the check that SS.DPL == CPL will fail.
      
      Temporarily assume that the CPL comes from CS.RPL during task switch
      to a protected-mode task.  This is the same approach used in QEMU's
      emulation code, which (until version 2.0) manually tracks the CPL.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2356aaeb
  6. 16 May, 2014 8 commits