1. 22 Apr, 2015 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-for-4.1-take2' of... · 2fa462f8
      Paolo Bonzini authored
      Merge tag 'kvm-arm-for-4.1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      KVM/ARM changes for v4.1, take #2:
      
      Rather small this time:
      
      - a fix for a nasty bug with virtual IRQ injection
      - a fix for irqfd
      2fa462f8
    • Andre Przywara's avatar
      KVM: arm/arm64: check IRQ number on userland injection · fd1d0ddf
      Andre Przywara authored
      When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
      only check it against a fixed limit, which historically is set
      to 127. With the new dynamic IRQ allocation the effective limit may
      actually be smaller (64).
      So when now a malicious or buggy userland injects a SPI in that
      range, we spill over on our VGIC bitmaps and bytemaps memory.
      I could trigger a host kernel NULL pointer dereference with current
      mainline by injecting some bogus IRQ number from a hacked kvmtool:
      -----------------
      ....
      DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
      DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
      DEBUG: IRQ #114 still in the game, writing to bytemap now...
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc07652e000
      [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
      Hardware name: FVP Base (DT)
      task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
      PC is at kvm_vgic_inject_irq+0x234/0x310
      LR is at kvm_vgic_inject_irq+0x30c/0x310
      pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
      .....
      
      So this patch fixes this by checking the SPI number against the
      actual limit. Also we remove the former legacy hard limit of
      127 in the ioctl code.
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
      [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
      as suggested by Christopher Covington]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      fd1d0ddf
    • Eric Auger's avatar
      KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi · 0b3289eb
      Eric Auger authored
      irqfd/arm curently does not support routing. kvm_irq_map_gsi is
      supposed to return all the routing entries associated with the
      provided gsi and return the number of those entries. We should
      return 0 at this point.
      Signed-off-by: default avatarEric Auger <eric.auger@linaro.org>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      0b3289eb
  2. 21 Apr, 2015 23 commits
    • Paolo Bonzini's avatar
      Merge tag 'signed-kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into kvm-master · 123857a7
      Paolo Bonzini authored
      Patch queue for ppc - 2015-04-21
      
      This is the latest queue for KVM on PowerPC changes. Highlights this
      time around:
      
        - Book3S HV: Debugging aids
        - Book3S HV: Minor performance improvements
        - Book3S HV: Cleanups
      123857a7
    • Ben Serebrin's avatar
      KVM: VMX: Preserve host CR4.MCE value while in guest mode. · 085e68ee
      Ben Serebrin authored
      The host's decision to enable machine check exceptions should remain
      in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
      and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
      
      Tested: Built.
      On earlier version, tested by injecting machine check
      while a guest is spinning.
      
      Before the change, if guest CR4.MCE==0, then the machine check is
      escalated to Catastrophic Error (CATERR) and the machine dies.
      If guest CR4.MCE==1, then the machine check causes VMEXIT and is
      handled normally by host Linux. After the change, injecting a machine
      check causes normal Linux machine check handling.
      Signed-off-by: default avatarBen Serebrin <serebrin@google.com>
      Reviewed-by: default avatarVenkatesh Srinivas <venkateshs@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      085e68ee
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras authored
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      66feed61
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb
      Paul Mackerras authored
      This replaces the assembler code for kvmhv_commence_exit() with C code
      in book3s_hv_builtin.c.  It also moves the IPI sending code that was
      in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
      can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      eddb60fb
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Streamline guest entry and exit · 6af27c84
      Paul Mackerras authored
      On entry to the guest, secondary threads now wait for the primary to
      switch the MMU after loading up most of their state, rather than before.
      This means that the secondary threads get into the guest sooner, in the
      common case where the secondary threads get to kvmppc_hv_entry before
      the primary thread.
      
      On exit, the first thread out increments the exit count and interrupts
      the other threads (to get them out of the guest) before saving most
      of its state, rather than after.  That means that the other threads
      exit sooner and means that the first thread doesn't spend so much
      time waiting for the other threads at the point where the MMU gets
      switched back to the host.
      
      This pulls out the code that increments the exit count and interrupts
      other threads into a separate function, kvmhv_commence_exit().
      This also makes sure that r12 and vcpu->arch.trap are set correctly
      in some corner cases.
      
      Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
      improvement.  Aggregating across vcpus for a guest with 32 vcpus,
      8 threads/vcore, running on a POWER8, gives this before the change:
      
       rm_entry:     avg 4537.3ns (222 - 48444, 1068878 samples)
        rm_exit:     avg 4787.6ns (152 - 165490, 1010717 samples)
        rm_intr:     avg 1673.6ns (12 - 341304, 3818691 samples)
      
      and this after the change:
      
       rm_entry:     avg 3427.7ns (232 - 68150, 1118921 samples)
        rm_exit:     avg 4716.0ns (12 - 150720, 1119477 samples)
        rm_intr:     avg 1614.8ns (12 - 522436, 3850432 samples)
      
      showing a substantial reduction in the time spent per guest entry in
      the real-mode guest entry code, and smaller reductions in the real
      mode guest exit and interrupt handling times.  (The test was to start
      the guest and boot Fedora 20 big-endian to the login prompt.)
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      6af27c84
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras authored
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      7d6c40da
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use decrementer to wake napping threads · fd6d53b1
      Paul Mackerras authored
      This arranges for threads that are napping due to their vcpu having
      ceded or due to not having a vcpu to wake up at the end of the guest's
      timeslice without having to be poked with an IPI.  We do that by
      arranging for the decrementer to contain a value no greater than the
      number of timebase ticks remaining until the end of the timeslice.
      In the case of a thread with no vcpu, this number is in the hypervisor
      decrementer already.  In the case of a ceded vcpu, we use the smaller
      of the HDEC value and the DEC value.
      
      Using the DEC like this when ceded means we need to save and restore
      the guest decrementer value around the nap.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      fd6d53b1
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI · ccc07772
      Paul Mackerras authored
      When running a multi-threaded guest and vcpu 0 in a virtual core
      is not running in the guest (i.e. it is busy elsewhere in the host),
      thread 0 of the physical core will switch the MMU to the guest and
      then go to nap mode in the code at kvm_do_nap.  If the guest sends
      an IPI to thread 0 using the msgsndp instruction, that will wake
      up thread 0 and cause all the threads in the guest to exit to the
      host unnecessarily.  To avoid the unnecessary exit, this arranges
      for the PECEDP bit to be cleared in this situation.  When napping
      due to a H_CEDE from the guest, we still set PECEDP so that the
      thread will wake up on an IPI sent using msgsndp.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ccc07772
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras authored
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      5d5b99cd
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu · 25fedfca
      Paul Mackerras authored
      Rather than calling cond_resched() in kvmppc_run_core() before doing
      the post-processing for the vcpus that we have just run (that is,
      calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
      that post-processing before calling cond_resched(), and that post-
      processing is moved out into its own function, post_guest_process().
      
      The reschedule point is now in kvmppc_run_vcpu() and we define a new
      vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
      task is runnable but not running.  (Doing the reschedule with the
      vcore in VCORE_INACTIVE state would be bad because there are potentially
      other vcpus waiting for the runner in kvmppc_wait_for_exec() which
      then wouldn't get woken up.)
      
      Also, we make use of the handy cond_resched_lock() function, which
      unlocks and relocks vc->lock for us around the reschedule.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      25fedfca
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed
      Paul Mackerras authored
      * Remove unused kvmppc_vcore::n_busy field.
      * Remove setting of RMOR, since it was only used on PPC970 and the
        PPC970 KVM support has been removed.
      * Don't use r1 or r2 in setting the runlatch since they are
        conventionally reserved for other things; use r0 instead.
      * Streamline the code a little and remove the ext_interrupt_to_host
        label.
      * Add some comments about register usage.
      * hcall_try_real_mode doesn't need to be global, and can't be
        called from C code anyway.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1f09c3ed
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update · d911f0be
      Paul Mackerras authored
      Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
      update (i.e. one of its 3 virtual processor areas needed to be pinned
      in memory so the host real mode code can update it on guest entry and
      exit), we would drop the vcore lock and do the update there and then.
      Future changes will make it inconvenient to drop the lock, so instead
      we now remove it from the list of runnable VCPUs and wake up its
      VCPU task.  This will have the effect that the VCPU task will exit
      kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
      re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
      to kvmppc_update_vpas() and then rejoin the vcore.
      
      The one complication is that the runner VCPU (whose VCPU task is the
      current task) might be one of the ones that gets removed from the
      runnable list.  In that case we just return from kvmppc_run_core()
      and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
      the runner if necessary.
      
      This all means that the VCORE_STARTING state is no longer used, so we
      remove it.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      d911f0be
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df
      Paul Mackerras authored
      This reads the timebase at various points in the real-mode guest
      entry/exit code and uses that to accumulate total, minimum and
      maximum time spent in those parts of the code.  Currently these
      times are accumulated per vcpu in 5 parts of the code:
      
      * rm_entry - time taken from the start of kvmppc_hv_entry() until
        just before entering the guest.
      * rm_intr - time from when we take a hypervisor interrupt in the
        guest until we either re-enter the guest or decide to exit to the
        host.  This includes time spent handling hcalls in real mode.
      * rm_exit - time from when we decide to exit the guest until the
        return from kvmppc_hv_entry().
      * guest - time spend in the guest
      * cede - time spent napping in real mode due to an H_CEDE hcall
        while other threads in the same vcore are active.
      
      These times are exposed in debugfs in a directory per vcpu that
      contains a file called "timings".  This file contains one line for
      each of the 5 timings above, with the name followed by a colon and
      4 numbers, which are the count (number of times the code has been
      executed), the total time, the minimum time, and the maximum time,
      all in nanoseconds.
      
      The overhead of the extra code amounts to about 30ns for an hcall that
      is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
      production environments may not wish to incur this overhead, the new
      code is conditional on a new config symbol,
      CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b6c295df
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT · e23a808b
      Paul Mackerras authored
      This creates a debugfs directory for each HV guest (assuming debugfs
      is enabled in the kernel config), and within that directory, a file
      by which the contents of the guest's HPT (hashed page table) can be
      read.  The directory is named vmnnnn, where nnnn is the PID of the
      process that created the guest.  The file is named "htab".  This is
      intended to help in debugging problems in the host's management
      of guest memory.
      
      The contents of the file consist of a series of lines like this:
      
        3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
      
      The first field is the index of the entry in the HPT, the second and
      third are the HPT entry, so the third entry contains the real page
      number that is mapped by the entry if the entry's valid bit is set.
      The fourth field is the guest's view of the second doubleword of the
      entry, so it contains the guest physical address.  (The format of the
      second through fourth fields are described in the Power ISA and also
      in arch/powerpc/include/asm/mmu-hash64.h.)
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e23a808b
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Add ICP real mode counters · 6e0365b7
      Suresh Warrier authored
      Add two counters to count how often we generate real-mode ICS resend
      and reject events. The counters provide some performance statistics
      that could be used in the future to consider if the real mode functions
      need further optimizing. The counters are displayed as part of IPC and
      ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.
      
      Also added two counters that count (approximately) how many times we
      don't find an ICP or ICS we're looking for. These are not currently
      exposed through sysfs, but can be useful when debugging crashes.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      6e0365b7
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode · b0221556
      Suresh Warrier authored
      Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
      to switch to the host to complete the rest of hypercall function in
      virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
      functions to be runnable in hypervisor real mode, thus avoiding the need
      to switch to the host to execute these functions in virtual mode. However,
      the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
      events - these events cannot be done in real mode and they will still need
      a switch to host virtual mode.
      
      There are sufficient differences between the real mode code and the
      virtual mode code for the ICS/ICP resend and reject functions that
      for now the code has been duplicated instead of sharing common code.
      In the future, we can look at creating common functions.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b0221556
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock · 34cb7954
      Suresh Warrier authored
      Replaces the ICS mutex lock with a spin lock since we will be porting
      these routines to real mode. Note that we need to disable interrupts
      before we take the lock in anticipation of the fact that on the guest
      side, we are running in the context of a hard irq and interrupts are
      disabled (EE bit off) when the lock is acquired. Again, because we
      will be acquiring the lock in hypervisor real mode, we need to use
      an arch_spinlock_t instead of a normal spinlock here as we want to
      avoid running any lockdep code (which may not be safe to execute in
      real mode).
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      34cb7954
    • Suresh E. Warrier's avatar
      KVM: PPC: Book3S HV: Add guest->host real mode completion counters · 878610fe
      Suresh E. Warrier authored
      Add counters to track number of times we switch from guest real mode
      to host virtual mode during an interrupt-related hyper call because the
      hypercall requires actions that cannot be completed in real mode. This
      will help when making optimizations that reduce guest-host transitions.
      
      It is safe to use an ordinary increment rather than an atomic operation
      because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
      only works on the ICP for the current VCPU.
      
      The counters are displayed as part of IPC and ICP state provided by
      /sys/debug/kernel/powerpc/kvm* for each VM.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      878610fe
    • Aneesh Kumar K.V's avatar
      KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte · a4bd6eb0
      Aneesh Kumar K.V authored
      This adds helper routines for locking and unlocking HPTEs, and uses
      them in the rest of the code.  We don't change any locking rules in
      this patch.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a4bd6eb0
    • Aneesh Kumar K.V's avatar
      KVM: PPC: Book3S HV: Remove RMA-related variables from code · 31037eca
      Aneesh Kumar K.V authored
      We don't support real-mode areas now that 970 support is removed.
      Remove the remaining details of rma from the code.  Also rename
      rma_setup_done to hpte_setup_done to better reflect the changes.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      31037eca
    • Michael Ellerman's avatar
      KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. · e928e9cb
      Michael Ellerman authored
      Some PowerNV systems include a hardware random-number generator.
      This HWRNG is present on POWER7+ and POWER8 chips and is capable of
      generating one 64-bit random number every microsecond.  The random
      numbers are produced by sampling a set of 64 unstable high-frequency
      oscillators and are almost completely entropic.
      
      PAPR defines an H_RANDOM hypercall which guests can use to obtain one
      64-bit random sample from the HWRNG.  This adds a real-mode
      implementation of the H_RANDOM hypercall.  This hypercall was
      implemented in real mode because the latency of reading the HWRNG is
      generally small compared to the latency of a guest exit and entry for
      all the threads in the same virtual core.
      
      Userspace can detect the presence of the HWRNG and the H_RANDOM
      implementation by querying the KVM_CAP_PPC_HWRNG capability.  The
      H_RANDOM hypercall implementation will only be invoked when the guest
      does an H_RANDOM hypercall if userspace first enables the in-kernel
      H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e928e9cb
    • David Gibson's avatar
      kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM · 99342cf8
      David Gibson authored
      On POWER, storage caching is usually configured via the MMU - attributes
      such as cache-inhibited are stored in the TLB and the hashed page table.
      
      This makes correctly performing cache inhibited IO accesses awkward when
      the MMU is turned off (real mode).  Some CPU models provide special
      registers to control the cache attributes of real mode load and stores but
      this is not at all consistent.  This is a problem in particular for SLOF,
      the firmware used on KVM guests, which runs entirely in real mode, but
      which needs to do IO to load the kernel.
      
      To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
      and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
      a logical address (aka guest physical address).  SLOF uses these for IO.
      
      However, because these are implemented within qemu, not the host kernel,
      these bypass any IO devices emulated within KVM itself.  The simplest way
      to see this problem is to attempt to boot a KVM guest from a virtio-blk
      device with iothread / dataplane enabled.  The iothread code relies on an
      in kernel implementation of the virtio queue notification, which is not
      triggered by the IO hcalls, and so the guest will stall in SLOF unable to
      load the guest OS.
      
      This patch addresses this by providing in-kernel implementations of the
      2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
      address not handled by the KVM IO bus will cause a VM exit, hitting the
      qemu implementation as before.
      
      Note that a userspace change is also required, in order to enable these
      new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      [agraf: fix compilation]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      99342cf8
    • Suresh E. Warrier's avatar
      powerpc: Export __spin_yield · ae75116e
      Suresh E. Warrier authored
      Export __spin_yield so that the arch_spin_unlock() function can
      be invoked from a module. This will be required for modules where
      we want to take a lock that is also is acquired in hypervisor
      real mode. Because we want to avoid running any lockdep code
      (which may not be safe in real mode), this lock needs to be
      an arch_spinlock_t instead of a normal spinlock.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ae75116e
  3. 15 Apr, 2015 3 commits
  4. 14 Apr, 2015 5 commits
    • Paolo Bonzini's avatar
      KVM: x86: cleanup kvm_irq_delivery_to_apic_fast · bea15428
      Paolo Bonzini authored
      Sparse is reporting a "we previously assumed 'src' could be null" error.
      This is true as far as the static analyzer can see, but in practice only
      IPIs can set shorthand to self and they also set 'src', so it's ok.
      Still, move the initialization of x2apic_ipi (and thus the NULL check for
      src right before the first use.
      
      While at it, initializing ret to "false" is somewhat confusing because of
      the almost immediate assigned of "true" to the same variable.  Thus,
      initialize it to "true" and modify it in the only path that used to use
      the value from "bool ret = false".  There is no change in generated code
      from this change.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bea15428
    • Nadav Amit's avatar
      KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save · 9e9c3fe4
      Nadav Amit authored
      kvm_init_msr_list is currently called before hardware_setup. As a result,
      vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
      save MSR_IA32_BNDCFGS.
      
      Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      
      Message-Id: <1428864435-4732-1-git-send-email-namit@cs.technion.ac.il>
      Cc: stable@vger.kernel.org # 3.15+
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9e9c3fe4
    • Linus Torvalds's avatar
      Merge tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · b79013b2
      Linus Torvalds authored
      Pull staging driver updates from Greg KH:
       "Here's the big staging driver patchset for 4.1-rc1.
      
        There's a lot of patches here, the Outreachy application period
        happened during this development cycle, so that means that there was a
        lot of cleanup patches accepted.  Other than the normal coding style
        and sparse fixes here, there are some driver updates and work toward
        making some of the drivers into "mergable" shape (like the Unisys
        drivers.)
      
        All of these have been in linux-next for a while"
      
      * tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1214 commits)
        staging: lustre: orthography & coding style
        staging: lustre: lnet: lnet: fix error return code
        staging: lustre: fix sparse warning
        Revert "Staging: sm750fb: Fix C99 Comments"
        Staging: rtl8192u: use correct array for debug output
        staging: rtl8192e: Remove dead code
        staging: rtl8192e: Comment cleanup (style/format)
        staging: rtl8192e: Fix indentation in rtllib_rx_auth_resp()
        staging: rtl8192e: Decrease nesting of rtllib_rx_auth_resp()
        staging: rtl8192e: Divide rtllib_rx_auth()
        staging: rtl8192e: Fix PRINTK_WITHOUT_KERN_LEVEL warnings
        staging: rtl8192e: Fix DO_WHILE_MACRO_WITH_TRAILING_SEMICOLON warning
        staging: rtl8192e: Fix BRACES warning
        staging: rtl8192e: Fix LINE_CONTINUATIONS warning
        staging: rtl8192e: Fix UNNECESSARY_PARENTHESES warnings
        staging: rtl8192e: remove unused EXPORT_SYMBOL_RSL macro
        staging: rtl8192e: Fix RETURN_VOID warnings
        staging: rtl8192e: Fix UNNECESSARY_ELSE warning
        staging: rtl8723au: Remove unneeded comments
        staging: rtl8723au: Use __func__ in trace logs
        ...
      b79013b2
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · c4be50ee
      Linus Torvalds authored
      Pull driver core updates from Greg KH:
       "Here's the driver-core / kobject / lz4 tree update for 4.1-rc1.
      
        Everything here has been in linux-next for a while with no reported
        issues.  It's mostly just coding style cleanups, with other minor
        changes in here as well, nothing big"
      
      * tag 'driver-core-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
        debugfs: allow bad parent pointers to be passed in
        stable_kernel_rules: Add clause about specification of kernel versions to patch.
        kobject: WARN as tip when call kobject_get() to a kobject not initialized
        lib/lz4: Pull out constant tables
        drivers: platform: parse IRQ flags from resources
        driver core: Make probe deferral more quiet
        drivers/core/of: Add symlink to device-tree from devices with an OF node
        device: Add dev_of_node() accessor
        drivers: base: fw: fix ret value when loading fw
        firmware: Avoid manual device_create_file() calls
        drivers/base: cacheinfo: validate device node for all the caches
        drivers/base: use tabs where possible in code indentation
        driver core: add missing blank line after declaration
        drivers: base: node: Delete space after pointer declaration
        drivers: base: memory: Use tabs instead of spaces
        firmware_class: Fix whitespace and indentation
        drivers: base: dma-mapping: Erase blank space after pointer
        drivers: base: class: Add a blank line after declarations
        attribute_container: fix missing blank lines after declarations
        drivers: base: memory: Fix switch indent
        ...
      c4be50ee
    • Linus Torvalds's avatar
      Merge tag 'usb-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 42e3a58b
      Linus Torvalds authored
      Pull USB driver updates from Greg KH:
       "Here's the big USB (and PHY) driver patchset for 4.1-rc1.
      
        Everything here has been in linux-next, and the full details are below
        in the shortlog.  Nothing major, just the normal round of new
        drivers,api updates, and other changes, mostly in the USB gadget area,
        as usual"
      
      * tag 'usb-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (252 commits)
        drivers/usb/core: devio.c: Removed an uneeded space before tab
        usb: dwc2: host: sleep USB_RESUME_TIMEOUT during resume
        usb: chipidea: debug: add low power mode check before print registers
        usb: chipidea: udc: bypass pullup DP when gadget connect in OTG fsm mode
        usb: core: hub: use new USB_RESUME_TIMEOUT
        usb: isp1760: hcd: use new USB_RESUME_TIMEOUT
        usb: dwc2: hcd: use new USB_RESUME_TIMEOUT
        usb: host: sl811: use new USB_RESUME_TIMEOUT
        usb: host: r8a66597: use new USB_RESUME_TIMEOUT
        usb: host: oxu210hp: use new USB_RESUME_TIMEOUT
        usb: host: fusbh200: use new USB_RESUME_TIMEOUT
        usb: host: fotg210: use new USB_RESUME_TIMEOUT
        usb: host: isp116x: use new USB_RESUME_TIMEOUT
        usb: musb: use new USB_RESUME_TIMEOUT
        usb: host: uhci: use new USB_RESUME_TIMEOUT
        usb: host: ehci: use new USB_RESUME_TIMEOUT
        usb: host: xhci: use new USB_RESUME_TIMEOUT
        usb: define a generic USB_RESUME_TIMEOUT macro
        usb: musb: dsps: fix build on i386 when COMPILE_TEST is set
        ehci-hub: use USB_DT_HUB
        ...
      42e3a58b
  5. 13 Apr, 2015 6 commits
    • Linus Torvalds's avatar
      Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 4fd48b45
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
       "Nothing too interesting.  Rik made cpuset cooperate better with
        isolcpus and there are several other cleanup patches"
      
      * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cpuset, isolcpus: document relationship between cpusets & isolcpus
        cpusets, isolcpus: exclude isolcpus from load balancing in cpusets
        sched, isolcpu: make cpu_isolated_map visible outside scheduler
        cpuset: initialize cpuset a bit early
        cgroup: Use kvfree in pidlist_free()
        cgroup: call cgroup_subsys->bind on cgroup subsys initialization
      4fd48b45
    • Linus Torvalds's avatar
      Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · a1480a16
      Linus Torvalds authored
      Pull libata updates from Tejun Heo:
      
       - Hannes's patchset implements support for better error reporting
         introduced by the new ATA command spec.
      
       - the deperecated pci_ dma API usages have been replaced by dma_ ones.
      
       - a bunch of hardware specific updates and some cleanups.
      
      * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ata: remove deprecated use of pci api
        ahci: st: st_configure_oob must be called after IP is clocked.
        ahci: st: Update the ahci_st DT documentation
        ahci: st: Update the DT example for how to obtain the PHY.
        sata_dwc_460ex: indent an if statement
        libata: Add tracepoints
        libata-eh: Set 'information' field for autosense
        libata: Implement support for sense data reporting
        libata: Implement NCQ autosense
        libata: use status bit definitions in ata_dump_status()
        ide,ata: Rename ATA_IDX to ATA_SENSE
        libata: whitespace fixes in ata_to_sense_error()
        libata: whitespace cleanup in ata_get_cmd_descript()
        libata: use READ_LOG_DMA_EXT
        libata: remove ATA_FLAG_LOWTAG
        sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev
        sata_dwc_460ex: move to generic DMA driver
        sata_dwc_460ex: join messages back
        sata: xgene: add ACPI support for APM X-Gene SATA ports
        ata: sata_mv: add proper definitions for LP_PHY_CTL register values
      a1480a16
    • Linus Torvalds's avatar
      Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 45141eea
      Linus Torvalds authored
      Pull workqueue updates from Tejun Heo:
       "Workqueue now prints debug information at the end of sysrq-t which
        should be helpful when tracking down suspected workqueue stalls.  It
        only prints out the ones with something currently going on so it
        shouldn't add much output in most cases"
      
      * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Reorder sysfs code
        percpu: Fix trivial typos in comments
        workqueue: dump workqueues on sysrq-t
        workqueue: keep track of the flushing task and pool manager
        workqueue: make the workqueues list RCU walkable
      45141eea
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8954672d
      Linus Torvalds authored
      Pull irq core updates from Thomas Gleixner:
       "Managerial summary:
      
        Core code:
         - final removal of IRQF_DISABLED
         - new state save/restore functions for virtualization support
         - wakeup support for stacked irqdomains
         - new function to solve the netpoll synchronization problem
      
       irqchips:
         - new driver for STi based devices
         - new driver for Vybrid MSCM
         - massive cleanup of the GIC driver by moving the GIC-addons to
           stacked irqdomains
         - the usual pile of fixes and updates to the various chip drivers"
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
        irqchip: GICv3: Add support for irq_[get, set]_irqchip_state()
        irqchip: GIC: Add support for irq_[get, set]_irqchip_state()
        genirq: Allow the irqchip state of an IRQ to be save/restored
        genirq: MSI: Fix freeing of unallocated MSI
        irqchip: renesas-irqc: Add wake-up support
        irqchip: armada-370-xp: Allow using wakeup source
        irqchip: mips-gic: Add new functions to start/stop the GIC counter
        irqchip: tegra: Add Tegra210 support
        irqchip: digicolor: Move digicolor_set_gc to init section
        irqchip: renesas-irqc: Add functional clock to bindings
        irqchip: renesas-irqc: Add minimal runtime PM support
        irqchip: renesas-irqc: Add more register documentation
        DT: exynos: update PMU binding
        ARM: exynos4/5: convert pmu wakeup to stacked domains
        irqchip: gic: Don't complain in gic_get_cpumask() if UP system
        ARM: zynq: switch from gic_arch_extn to gic_set_irqchip_flags
        ARM: ux500: switch from gic_arch_extn to gic_set_irqchip_flags
        ARM: shmobile: remove use of gic_arch_extn.irq_set_wake
        irqchip: gic: Add an entry point to set up irqchip flags
        ARM: omap: convert wakeupgen to stacked domains
        ...
      8954672d
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 3be1b98e
      Linus Torvalds authored
      Pull PCI changes from Bjorn Helgaas:
       "Enumeration
          - Read capability list as dwords, not bytes (Sean O. Stalley)
      
        Resource management
          - Don't check for PNP overlaps with unassigned PCI BARs (Bjorn Helgaas)
          - Mark invalid BARs as unassigned (Bjorn Helgaas)
          - Show driver, BAR#, and resource on pci_ioremap_bar() failure (Bjorn Helgaas)
          - Fail pci_ioremap_bar() on unassigned resources (Bjorn Helgaas)
          - Assign resources before drivers claim devices (Yijing Wang)
          - Claim bus resources before pci_bus_add_devices() (Yijing Wang)
      
        Power management
          - Optimize device state transition delays (Aaron Lu)
          - Don't clear ASPM bits when the FADT declares it's unsupported (Matthew Garrett)
      
        Virtualization
          - Add ACS quirks for Intel 1G NICs (Alex Williamson)
      
        IOMMU
          - Add ptr to OF node arg to of_iommu_configure() (Murali Karicheri)
          - Move of_dma_configure() to device.c to help re-use (Murali Karicheri)
          - Fix size when dma-range is not used (Murali Karicheri)
          - Add helper functions pci_get[put]_host_bridge_device() (Murali Karicheri)
          - Add of_pci_dma_configure() to update DMA configuration (Murali Karicheri)
          - Update DMA configuration from DT (Murali Karicheri)
          - dma-mapping: limit IOMMU mapping size (Murali Karicheri)
          - Calculate device DMA masks based on DT dma-range size (Murali Karicheri)
      
        ARM Versatile host bridge driver
          - Check for devm_ioremap_resource() failures (Jisheng Zhang)
      
        Broadcom iProc host bridge driver
          - Add Broadcom iProc PCIe driver (Ray Jui)
      
        Marvell MVEBU host bridge driver
          - Add suspend/resume support (Thomas Petazzoni)
      
        Renesas R-Car host bridge driver
          - Fix position of MSI enable bit (Nobuhiro Iwamatsu)
          - Write zeroes to reserved PCIEPARL bits (Nobuhiro Iwamatsu)
          - Change PCIEPARL and PCIEPARH to PCIEPALR and PCIEPAUR (Nobuhiro Iwamatsu)
          - Verify that mem_res is 64K-aligned (Nobuhiro Iwamatsu)
      
        Samsung Exynos host bridge driver
          - Fix INTx enablement statement termination error (Jaehoon Chung)
      
        Miscellaneous
          - Make a shareable UUID for PCI firmware ACPI _DSM (Aaron Lu)
          - Clarify policy for vendor IDs in pci.txt (Michael S. Tsirkin)"
      
      * tag 'pci-v4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (36 commits)
        PCI: Read capability list as dwords, not bytes
        PCI: layerscape: Simplify platform_get_resource_byname() failure checking
        PCI: keystone: Don't dereference possible NULL pointer
        PCI: versatile: Check for devm_ioremap_resource() failures
        PCI: Don't clear ASPM bits when the FADT declares it's unsupported
        PCI: Clarify policy for vendor IDs in pci.txt
        PCI/ACPI: Optimize device state transition delays
        PCI: Export pci_find_host_bridge() for use inside PCI core
        PCI: Make a shareable UUID for PCI firmware ACPI _DSM
        PCI: Fix typo in Thunderbolt kernel message
        PCI: exynos: Fix INTx enablement statement termination error
        PCI: iproc: Add Broadcom iProc PCIe support
        PCI: iproc: Add DT docs for Broadcom iProc PCIe driver
        PCI: Export symbols required for loadable host driver modules
        PCI: Add ACS quirks for Intel 1G NICs
        PCI: mvebu: Add suspend/resume support
        PCI: Cleanup control flow
        sparc/PCI: Claim bus resources before pci_bus_add_devices()
        PCI: Assign resources before drivers claim devices (pci_scan_root_bus())
        PCI: Fail pci_ioremap_bar() on unassigned resources
        ...
      3be1b98e
    • Linus Torvalds's avatar
      Merge tag 'hsi-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi · 392b46f3
      Linus Torvalds authored
      Pull HSI changes from Sebastian Reichel:
      
       - nokia-modem: support speech data
       - misc fixes
      
      * tag 'hsi-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
        HSI: cmt_speech: fix error return code
        HSI: nokia-modem: Add cmt-speech support
        HSI: cmt_speech: Add cmt-speech driver
        HSI: nokia-modem: fix error return code
      392b46f3