1. 25 Sep, 2011 40 commits
    • Paul Mackerras's avatar
      KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a
      Paul Mackerras authored
      With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
      core), whenever a CPU goes idle, we have to pull all the other
      hardware threads in the core out of the guest, because the H_CEDE
      hcall is handled in the kernel.  This is inefficient.
      
      This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
      in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
      exit to the kernel if all the other vcpus in the same core are also
      idle.  Otherwise we mark this vcpu as napping, save state that could
      be lost in nap mode (mainly GPRs and FPRs), and execute the nap
      instruction.  When the thread wakes up, because of a decrementer or
      external interrupt, we come back in at kvm_start_guest (from the
      system reset interrupt vector), find the `napping' flag set in the
      paca, and go to the resume path.
      
      This has some other ramifications.  First, when starting a core, we
      now start all the threads, both those that are immediately runnable and
      those that are idle.  This is so that we don't have to pull all the
      threads out of the guest when an idle thread gets a decrementer interrupt
      and wants to start running.  In fact the idle threads will all start
      with the H_CEDE hcall returning; being idle they will just do another
      H_CEDE immediately and go to nap mode.
      
      This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
      These functions have been restructured to make them simpler and clearer.
      We introduce a level of indirection in the wait queue that gets woken
      when external and decrementer interrupts get generated for a vcpu, so
      that we can have the 4 vcpus in a vcore using the same wait queue.
      We need this because the 4 vcpus are being handled by one thread.
      
      Secondly, when we need to exit from the guest to the kernel, we now
      have to generate an IPI for any napping threads, because an HDEC
      interrupt doesn't wake up a napping thread.
      
      Thirdly, we now need to be able to handle virtual external interrupts
      and decrementer interrupts becoming pending while a thread is napping,
      and deliver those interrupts to the guest when the thread wakes.
      This is done in kvmppc_cede_reentry, just before fast_guest_return.
      
      Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
      and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
      from kvm_arch_vcpu_runnable.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      19ccb76a
    • Paul Mackerras's avatar
      KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode · 02143947
      Paul Mackerras authored
      This simplifies the way that the book3s_pr makes the transition to
      real mode when entering the guest.  We now call kvmppc_entry_trampoline
      (renamed from kvmppc_rmcall) in the base kernel using a normal function
      call instead of doing an indirect call through a pointer in the vcpu.
      If kvm is a module, the module loader takes care of generating a
      trampoline as it does for other calls to functions outside the module.
      
      kvmppc_entry_trampoline then disables interrupts and jumps to
      kvmppc_handler_trampoline_enter in real mode using an rfi[d].
      That then uses the link register as the address to return to
      (potentially in module space) when the guest exits.
      
      This also simplifies the way that we call the Linux interrupt handler
      when we exit the guest due to an external, decrementer or performance
      monitor interrupt.  Instead of turning on the MMU, then deciding that
      we need to call the Linux handler and turning the MMU back off again,
      we now go straight to the handler at the point where we would turn the
      MMU on.  The handler will then return to the virtual-mode code
      (potentially in the module).
      
      Along the way, this moves the setting and clearing of the HID5 DCBZ32
      bit into real-mode interrupts-off code, and also makes sure that
      we clear the MSR[RI] bit before loading values into SRR0/1.
      
      The net result is that we no longer need any code addresses to be
      stored in vcpu->arch.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      02143947
    • Paul Mackerras's avatar
      KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7
      Paul Mackerras authored
      This makes arch/powerpc/kvm/book3s_rmhandlers.S and
      arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
      separate compilation units rather than having them #included in
      arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
      conditional branches between the exception prologs in
      exceptions-64s.S and the KVM handlers, so there is no need to
      keep their contents close together in the vmlinux image.
      
      In their current location, they are using up part of the limited
      space between the first-level interrupt handlers and the firmware
      NMI data area at offset 0x7000, and with some kernel configurations
      this area will overflow (e.g. allyesconfig), leading to an
      "attempt to .org backwards" error when compiling exceptions-64s.S.
      
      Moving them out requires that we add some #includes that the
      book3s_{,hv_}rmhandlers.S code was previously getting implicitly
      via exceptions-64s.S.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      177339d7
    • Alexander Graf's avatar
      KVM: PPC: Add sanity checking to vcpu_run · af8f38b3
      Alexander Graf authored
      There are multiple features in PowerPC KVM that can now be enabled
      depending on the user's wishes. Some of the combinations don't make
      sense or don't work though.
      
      So this patch adds a way to check if the executing environment would
      actually be able to run the guest properly. It also adds sanity
      checks if PVR is set (should always be true given the current code
      flow), if PAPR is only used with book3s_64 where it works and that
      HV KVM is only used in PAPR mode.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      af8f38b3
    • Alexander Graf's avatar
      KVM: PPC: Enable the PAPR CAP for Book3S · 930b412a
      Alexander Graf authored
      Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
      enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
      mode.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      930b412a
    • Alexander Graf's avatar
      KVM: PPC: Support SC1 hypercalls for PAPR in PR mode · a668f2bd
      Alexander Graf authored
      PAPR defines hypercalls as SC1 instructions. Using these, the guest modifies
      page tables and does other privileged operations that it wouldn't be allowed
      to do in supervisor mode.
      
      This patch adds support for PR KVM to trap these instructions and route them
      through the same PAPR hypercall interface that we already use for HV style
      KVM.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a668f2bd
    • Alexander Graf's avatar
      KVM: PPC: Stub emulate CFAR and PURR SPRs · aacf9aa3
      Alexander Graf authored
      Recent Linux versions use the CFAR and PURR SPRs, but don't really care about
      their contents (yet). So for now, we can simply return 0 when the guest wants
      to read them.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      aacf9aa3
    • Alexander Graf's avatar
      KVM: PPC: Add PAPR hypercall code for PR mode · 0254f074
      Alexander Graf authored
      When running a PAPR guest, we need to handle a few hypercalls in kernel space,
      most prominently the page table invalidation (to sync the shadows).
      
      So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
      to share the code with HV mode, but it ended up being a lot easier this way
      around, as the two differ too much in those details.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      
      ---
      
      v1 -> v2:
      
        - whitespace fix
      0254f074
    • Alexander Graf's avatar
      KVM: PPC: Add support for explicit HIOR setting · a15bd354
      Alexander Graf authored
      Until now, we always set HIOR based on the PVR, but this is just wrong.
      Instead, we should be setting HIOR explicitly, so user space can decide
      what the initial HIOR value is - just like on real hardware.
      
      We keep the old PVR based way around for backwards compatibility, but
      once user space uses the SREGS based method, we drop the PVR logic.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a15bd354
    • Alexander Graf's avatar
      KVM: PPC: Read out syscall instruction on trap · 77e675ad
      Alexander Graf authored
      We have a few traps where we cache the instruction that cause the trap
      for analysis later on. Since we now need to be able to distinguish
      between SC 0 and SC 1 system calls and the only way to find out which
      is which is by looking at the instruction, we also read out the instruction
      causing the system call.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      77e675ad
    • Alexander Graf's avatar
      KVM: PPC: Interpret SDR1 as HVA in PAPR mode · 04fcc11b
      Alexander Graf authored
      When running a PAPR guest, the guest is not allowed to set SDR1 - instead
      the HTAB information is held in internal hypervisor structures. But all of
      our current code relies on SDR1 and walking the HTAB like on real hardware.
      
      So in order to not be too intrusive, we simply set SDR1 to the HTAB we hold
      in host memory. That way we can keep the HTAB in user space, but use it from
      kernel space to map the guest.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      04fcc11b
    • Alexander Graf's avatar
      KVM: PPC: Check privilege level on SPRs · 317a8fa3
      Alexander Graf authored
      We have 3 privilege levels: problem state, supervisor state and hypervisor
      state. Each of them can access different SPRs, so we need to check on every
      SPR if it's accessible in the respective mode.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      317a8fa3
    • Alexander Graf's avatar
      KVM: PPC: Add papr_enabled flag · 9432ba60
      Alexander Graf authored
      When running a PAPR guest, some things change. The privilege level drops
      from hypervisor to supervisor, SDR1 gets treated differently and we interpret
      hypercalls. For bisectability sake, add the flag now, but only enable it when
      all the support code is there.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9432ba60
    • Alexander Graf's avatar
      KVM: PPC: move compute_tlbie_rb to book3s common header · db507c30
      Alexander Graf authored
      We need the compute_tlbie_rb in _pr and _hv implementations for papr
      soon, so let's move it over to a common header file that both
      implementations can leverage.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      db507c30
    • Avi Kivity's avatar
      KVM: Restore missing powerpc API docs · 36442687
      Avi Kivity authored
      Commit 371fefd6 lost a doc hunk somehow, restore it.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      36442687
    • Kevin Tian's avatar
      KVM: APIC: avoid instruction emulation for EOI writes · 58fbbf26
      Kevin Tian authored
      Instruction emulation for EOI writes can be skipped, since sane
      guest simply uses MOV instead of string operations. This is a nice
      improvement when guest doesn't support x2apic or hyper-V EOI
      support.
      
      a single VM bandwidth is observed with ~8% bandwidth improvement
      (7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation.
      Signed-off-by: default avatarKevin Tian <kevin.tian@intel.com>
      <Based on earlier work from>:
      Signed-off-by: default avatarEddie Dong <eddie.dong@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      58fbbf26
    • Nadav Har'El's avatar
      KVM: SVM: Fix TSC MSR read in nested SVM · 45133eca
      Nadav Har'El authored
      When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
      read without exit), we need to return L2's notion of the TSC, not L1's.
      
      The current code incorrectly returned L1 TSC, because svm_get_msr() was also
      used in x86.c where this was assumed, but now that these places call the new
      svm_read_l1_tsc(), the MSR read can be fixed.
      Signed-off-by: default avatarNadav Har'El <nyh@il.ibm.com>
      Tested-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Acked-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      45133eca
    • Nadav Har'El's avatar
      KVM: nVMX: Fix nested VMX TSC emulation · 27fc51b2
      Nadav Har'El authored
      This patch fixes two corner cases in nested (L2) handling of TSC-related
      issues:
      
      1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to
      the TSC MSR without an exit, then this should set L1's TSC value itself - not
      offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code).
      
      2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore
      the vmcs12.TSC_OFFSET.
      Signed-off-by: default avatarNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      27fc51b2
    • Nadav Har'El's avatar
      KVM: L1 TSC handling · d5c1785d
      Nadav Har'El authored
      KVM assumed in several places that reading the TSC MSR returns the value for
      L1. This is incorrect, because when L2 is running, the correct TSC read exit
      emulation is to return L2's value.
      
      We therefore add a new x86_ops function, read_l1_tsc, to use in places that
      specifically need to read the L1 TSC, NOT the TSC of the current level of
      guest.
      
      Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
      by a different patch sent by Zachary Amsden (and not yet applied):
      kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
      course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().
      
      [avi: moved callback to kvm_x86_ops tsc block]
      Signed-off-by: default avatarNadav Har'El <nyh@il.ibm.com>
      Acked-by: default avatarZachary Amsdem <zamsden@gmail.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d5c1785d
    • Sasha Levin's avatar
      KVM: nVMX: Document 'nested' parameter · e1a72ae2
      Sasha Levin authored
      Add documentation of the new 'nested' parameter to
      'Documentation/kernel-parameters.txt'.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Nadav Har'El <nyh@il.ibm.com>
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e1a72ae2
    • Yang, Wei Y's avatar
      KVM: MMU: Fix SMEP failure during fetch · cd46868c
      Yang, Wei Y authored
      This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK
      bit set in the case of SMEP fault.  The code updated 'eperm' after
      the variable was checked.
      Signed-off-by: default avatarYang, Wei <wei.y.yang@intel.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      cd46868c
    • Avi Kivity's avatar
      KVM: MMU: Do not unconditionally read PDPTE from guest memory · e4e517b4
      Avi Kivity authored
      Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
      On SVM, it is not possible to implement this, but on VMX this is possible
      and was indeed implemented until nested SVM changed this to unconditionally
      read PDPTEs dynamically.  This has noticable impact when running PAE guests.
      
      Fix by changing the MMU to read PDPTRs from the cache, falling back to
      reading from memory for the nested MMU.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Tested-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      e4e517b4
    • Julia Lawall's avatar
      KVM: VMX: trivial: use BUG_ON · cf3ace79
      Julia Lawall authored
      Use BUG_ON(x) rather than if(x) BUG();
      
      The semantic patch that fixes this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ identifier x; @@
      -if (x) BUG();
      +BUG_ON(x);
      
      @@ identifier x; @@
      -if (!x) BUG();
      +BUG_ON(!x);
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      cf3ace79
    • Marcelo Tosatti's avatar
      KVM: x86: report valid microcode update ID · 742bc670
      Marcelo Tosatti authored
      Windows Server 2008 SP2 checked build with smp > 1 BSOD's during
      boot due to lack of microcode update:
      
      *** Assertion failed: The system BIOS on this machine does not properly
      support the processor.  The system BIOS did not load any microcode update.
      A BIOS containing the latest microcode update is needed for system reliability.
      (CurrentUpdateRevision != 0)
      ***   Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440
      
      Report a non-zero microcode update signature to make it happy.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      742bc670
    • Takuya Yoshikawa's avatar
      KVM: x86 emulator: Make x86_decode_insn() return proper macros · 1d2887e2
      Takuya Yoshikawa authored
      Return EMULATION_OK/FAILED consistently.  Also treat instruction fetch
      errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED;
      although this cannot happen in practice, the current logic will continue
      the emulation even if the decoder fails to fetch the instruction.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      1d2887e2
    • Takuya Yoshikawa's avatar
      KVM: x86 emulator: Let compiler know insn_fetch() rarely fails · 7d88bb48
      Takuya Yoshikawa authored
      Fetching the instruction which was to be executed by the guest cannot
      fail normally.  So compiler should always predict that it will succeed.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      7d88bb48
    • Takuya Yoshikawa's avatar
      KVM: x86 emulator: Drop _size argument from insn_fetch() · e85a1085
      Takuya Yoshikawa authored
      _type is enough to know the size.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e85a1085
    • Takuya Yoshikawa's avatar
      KVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte() · 807941b1
      Takuya Yoshikawa authored
      Instead of passing ctxt->_eip from insn_fetch() call sites, get it from
      ctxt in do_insn_fetch_byte().  This is done by replacing the argument
      _eip of insn_fetch() with _ctxt, which should be better than letting the
      macro use ctxt silently in its body.
      
      Though this changes the place where ctxt->_eip is incremented from
      insn_fetch() to do_insn_fetch_byte(), this does not have any real
      effect.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      807941b1
    • Sasha Levin's avatar
      KVM: Intelligent device lookup on I/O bus · 743eeb0b
      Sasha Levin authored
      Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
      is to call the read or write callback for each device registered
      on the bus until we find a device which handles it.
      
      Since the number of devices on a bus can be significant due to ioeventfds
      and coalesced MMIO zones, this leads to a lot of overhead on each IO
      operation.
      
      Instead of registering devices, we now register ranges which points to
      a device. Lookup is done using an efficient bsearch instead of a linear
      search.
      
      Performance test was conducted by comparing exit count per second with
      200 ioeventfds created on one byte and the guest is trying to access a
      different byte continuously (triggering usermode exits).
      Before the patch the guest has achieved 259k exits per second, after the
      patch the guest does 274k exits per second.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      743eeb0b
    • Stefan Hajnoczi's avatar
      KVM: Use __print_symbolic() for vmexit tracepoints · 0d460ffc
      Stefan Hajnoczi authored
      The vmexit tracepoints format the exit_reason to make it human-readable.
      Since the exit_reason depends on the instruction set (vmx or svm),
      formatting is handled with ftrace_print_symbols_seq() by referring to
      the appropriate exit reason table.
      
      However, the ftrace_print_symbols_seq() function is not meant to be used
      directly in tracepoints since it does not export the formatting table
      which userspace tools like trace-cmd and perf use to format traces.
      
      In practice perf dies when formatting vmexit-related events and
      trace-cmd falls back to printing the numeric value (with extra
      formatting code in the kvm plugin to paper over this limitation).  Other
      userspace consumers of vmexit-related tracepoints would be in similar
      trouble.
      
      To avoid significant changes to the kvm_exit tracepoint, this patch
      moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
      selects the right table with __print_symbolic() depending on the
      instruction set.  Note that __print_symbolic() is designed for exporting
      the formatting table to userspace and allows trace-cmd and perf to work.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      0d460ffc
    • Stefan Hajnoczi's avatar
      KVM: Record instruction set in all vmexit tracepoints · e097e5ff
      Stefan Hajnoczi authored
      The kvm_exit tracepoint recently added the isa argument to aid decoding
      exit_reason.  The semantics of exit_reason depend on the instruction set
      (vmx or svm) and the isa argument allows traces to be analyzed on other
      machines.
      
      Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject
      so these tracepoints can also be self-describing.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e097e5ff
    • Mike Waychison's avatar
      KVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGE · d1613ad5
      Mike Waychison authored
      Commit 0945d4b228 tried to fix the get_msr path for the
      HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested.  We should be
      returning 0 if the read succeeded, and passing the value back to the
      caller via the pdata out argument, not returning the value directly.
      Signed-off-by: default avatarMike Waychison <mikew@google.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d1613ad5
    • Mike Waychison's avatar
      KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGE · 14fa67ee
      Mike Waychison authored
      "get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even
      though it is explicitly enumerated as something the vmm should save in
      msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST
      ioctl.
      
      Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE.  We simply return the
      guest visible value of this register, which seems to be correct as a set
      on the register is validated for us already.
      Signed-off-by: default avatarMike Waychison <mikew@google.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      14fa67ee
    • Sasha Levin's avatar
      KVM: Make coalesced mmio use a device per zone · 2b3c246a
      Sasha Levin authored
      This patch changes coalesced mmio to create one mmio device per
      zone instead of handling all zones in one device.
      
      Doing so enables us to take advantage of existing locking and prevents
      a race condition between coalesced mmio registration/unregistration
      and lookups.
      Suggested-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      2b3c246a
    • Sasha Levin's avatar
      KVM: x86: Raise the hard VCPU count limit · 8c3ba334
      Sasha Levin authored
      The patch raises the hard limit of VCPU count to 254.
      
      This will allow developers to easily work on scalability
      and will allow users to test high VCPU setups easily without
      patching the kernel.
      
      To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
      now returns the recommended VCPU limit (which is still 64) - this
      should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
      returns the hard limit which is now 254.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Suggested-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      8c3ba334
    • Sasha Levin's avatar
      KVM: MMIO: Lock coalesced device when checking for available entry · c298125f
      Sasha Levin authored
      Move the check whether there are available entries to within the spinlock.
      This allows working with larger amount of VCPUs and reduces premature
      exits when using a large number of VCPUs.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      c298125f
    • Xiao Guangrong's avatar
      KVM: x86: cleanup the code of read/write emulation · 22388a3c
      Xiao Guangrong authored
      Using the read/write operation to remove the same code
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      22388a3c
    • Xiao Guangrong's avatar
      KVM: x86: abstract the operation for read/write emulation · 77d197b2
      Xiao Guangrong authored
      The operations of read emulation and write emulation are very similar, so we
      can abstract the operation of them, in larter patch, it is used to cleanup the
      same code
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      77d197b2
    • Xiao Guangrong's avatar
      KVM: x86: fix broken read emulation spans a page boundary · ca7d58f3
      Xiao Guangrong authored
      If the range spans a page boundary, the mmio access can be broke, fix it as
      write emulation.
      
      And we already get the guest physical address, so use it to read guest data
      directly to avoid walking guest page table again
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      ca7d58f3
    • Avi Kivity's avatar
      KVM: x86 emulator: fix Src2CL decode · 9be3be1f
      Avi Kivity authored
      Src2CL decode (used for double width shifts) erronously decodes only bit 3
      of %rcx, instead of bits 7:0.
      
      Fix by decoding %cl in its entirety.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      9be3be1f