Commits · a3a8ff8ebf87cbd828f54276d902c3c8ee0c4781 · Kirill Smelkov / linux

12 Jul, 2011 32 commits

KVM: nVMX: Move host-state field setup to a function · a3a8ff8e

Nadav Har'El authored May 25, 2011

Move the setting of constant host-state fields (fields that do not change
throughout the life of the guest) from vmx_vcpu_setup to a new common function
vmx_set_constant_host_state(). This function will also be used to set the
host state when running L2 guests.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a3a8ff8e

KVM: nVMX: Implement VMREAD and VMWRITE · 49f705c5

Nadav Har'El authored May 25, 2011

Implement the VMREAD and VMWRITE instructions. With these instructions, L1
can read and write to the VMCS it is holding. The values are read or written
to the fields of the vmcs12 structure introduced in a previous patch.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

49f705c5

KVM: nVMX: Implement VMPTRST · 6a4d7550

Nadav Har'El authored May 25, 2011

This patch implements the VMPTRST instruction.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

6a4d7550

KVM: nVMX: Implement VMPTRLD · 63846663

Nadav Har'El authored May 25, 2011

This patch implements the VMPTRLD instruction.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

63846663

KVM: nVMX: Implement VMCLEAR · 27d6c865

Nadav Har'El authored May 25, 2011

This patch implements the VMCLEAR instruction.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

27d6c865

KVM: nVMX: Success/failure of VMX instructions. · 0140caea

Nadav Har'El authored May 25, 2011

VMX instructions specify success or failure by setting certain RFLAGS bits.
This patch contains common functions to do this, and they will be used in
the following patches which emulate the various VMX instructions.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0140caea

KVM: nVMX: Add VMCS fields to the vmcs12 · 22bd0358

Nadav Har'El authored May 25, 2011

In this patch we add to vmcs12 (the VMCS that L1 keeps for L2) all the
standard VMCS fields.

Later patches will enable L1 to read and write these fields using VMREAD/
VMWRITE, and they will be used during a VMLAUNCH/VMRESUME in preparing vmcs02,
a hardware VMCS for running L2.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

22bd0358

KVM: nVMX: Introduce vmcs02: VMCS used to run L2 · ff2f6fe9

Nadav Har'El authored May 25, 2011

We saw in a previous patch that L1 controls its L2 guest with a vcms12.
L0 needs to create a real VMCS for running L2. We call that "vmcs02".
A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02
fields. This patch only contains code for allocating vmcs02.

In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can
be reused even when L1 runs multiple L2 guests. However, in future versions
we'll probably want to add an optimization where vmcs02 fields that rarely
change will not be set each time. For that, we may want to keep around several
vmcs02s of L2 guests that have recently run, so that potentially we could run
these L2s again more quickly because less vmwrites to vmcs02 will be needed.

This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s.
As explained above, in the current version we choose VMCS02_POOL_SIZE=1,
I.e., one vmcs02 is allocated (and loaded onto the processor), and it is
reused to enter any L2 guest. In the future, when prepare_vmcs02() is
optimized not to set all fields every time, VMCS02_POOL_SIZE should be
increased.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ff2f6fe9

KVM: nVMX: Decoding memory operands of VMX instructions · 064aea77

Nadav Har'El authored May 25, 2011

This patch includes a utility function for decoding pointer operands of VMX
instructions issued by L1 (a guest hypervisor)
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

064aea77

KVM: nVMX: Implement reading and writing of VMX MSRs · b87a51ae

Nadav Har'El authored May 25, 2011

When the guest can use VMX instructions (when the "nested" module option is
on), it should also be able to read and write VMX MSRs, e.g., to query about
VMX capabilities. This patch adds this support.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b87a51ae

KVM: nVMX: Introduce vmcs12: a VMCS structure for L1 · a9d30f33

Nadav Har'El authored May 25, 2011

An implementation of VMX needs to define a VMCS structure. This structure
is kept in guest memory, but is opaque to the guest (who can only read or
write it with VMX instructions).

This patch starts to define the VMCS structure which our nested VMX
implementation will present to L1. We call it "vmcs12", as it is the VMCS
that L1 keeps for its L2 guest. We will add more content to this structure
in later patches.

This patch also adds the notion (as required by the VMX spec) of L1's "current
VMCS", and finally includes utility functions for mapping the guest-allocated
VMCSs in host memory.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a9d30f33

KVM: nVMX: Allow setting the VMXE bit in CR4 · 5e1746d6

Nadav Har'El authored May 25, 2011

This patch allows the guest to enable the VMXE bit in CR4, which is a
prerequisite to running VMXON.

Whether to allow setting the VMXE bit now depends on the architecture (svm
or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
will also return 1, and this will cause kvm_set_cr4() will throw a #GP.

Turning on the VMXE bit is allowed only when the nested VMX feature is
enabled, and turning it off is forbidden after a vmxon.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5e1746d6

KVM: nVMX: Implement VMXON and VMXOFF · ec378aee

Nadav Har'El authored May 25, 2011

This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ec378aee

KVM: nVMX: Add "nested" module option to kvm_intel · 801d3424

Nadav Har'El authored May 25, 2011

This patch adds to kvm_intel a module option "nested". This option controls
whether the guest can use VMX instructions, i.e., whether we allow nested
virtualization. A similar, but separate, option already exists for the
SVM module.

This option currently defaults to 0, meaning that nested VMX must be
explicitly enabled by giving nested=1. When nested VMX matures, the default
should probably be changed to enable nested VMX by default - just like
nested SVM is currently enabled by default.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

801d3424

KVM: x86 emulator: Avoid clearing the whole decode_cache · b5c9ff73

Takuya Yoshikawa authored May 25, 2011

During tracing the emulator, we noticed that init_emulate_ctxt()
sometimes took a bit longer time than we expected.

This patch is for mitigating the problem by some degree.

By looking into the function, we soon notice that it clears the whole
decode_cache whose size is about 2.5K bytes now.  Furthermore, most of
the bytes are taken for the two read_cache arrays, which are used only
by a few instructions.

Considering the fact that we are not assuming the cache arrays have
been cleared when we store actual data, we do not need to clear the
arrays: 2K bytes elimination.  In addition, we can avoid clearing the
fetch_cache and regs arrays.

This patch changes the initialization not to clear the arrays.

On our 64-bit host, init_emulate_ctxt() becomes 0.3 to 0.5us faster with
this patch applied.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b5c9ff73

KVM: x86 emulator: Clean up init_emulate_ctxt() · adf52235

Takuya Yoshikawa authored May 25, 2011

Use a local pointer to the emulate_ctxt for simplicity.  Then, arrange
the hard-to-read mode selection lines neatly.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

adf52235

KVM: Clean up error handling during VCPU creation · d780592b

Jan Kiszka authored May 23, 2011

So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d780592b

KVM: VMX: Keep list of loaded VMCSs, instead of vcpus · d462b819

Nadav Har'El authored May 24, 2011

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d462b819

KVM: Sanitize cpuid · 24c82e57

Avi Kivity authored May 18, 2011

Instead of blacklisting known-unsupported cpuid leaves, whitelist known-
supported leaves.  This is more conservative and prevents us from reporting
features we don't support.  Also whitelist a few more leaves while at it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

24c82e57

KVM: MMU: cleanup for dropping parent pte · bcdd9a93

Xiao Guangrong authored May 15, 2011

Introduce drop_parent_pte to remove the rmap of parent pte and
clear parent pte
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bcdd9a93

KVM: MMU: cleanup for kvm_mmu_page_unlink_children · 38e3b2b2

Xiao Guangrong authored May 15, 2011

Cleanup the same operation between kvm_mmu_page_unlink_children and
mmu_pte_write_zap_pte
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

38e3b2b2

KVM: MMU: remove the arithmetic of parent pte rmap · 67052b35

Xiao Guangrong authored May 15, 2011

Parent pte rmap and page rmap are very similar, so use the same arithmetic
for them
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

67052b35

KVM: MMU: abstract the operation of rmap · 53c07b18

Xiao Guangrong authored May 15, 2011

Abstract the operation of rmap to spte_list, then we can use it for the
reverse mapping of parent pte in the later patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

53c07b18

KVM: fix uninitialized warning · 1249b96e

Xiao Guangrong authored May 15, 2011

Fix:

 warning: ‘cs_sel’ may be used uninitialized in this function
 warning: ‘ss_sel’ may be used uninitialized in this function
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1249b96e

KVM: use __copy_to_user/__clear_user to write guest page · 8b0cedff

Xiao Guangrong authored May 15, 2011

Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

8b0cedff

KVM: MMU: optimize pte write path if don't have protected sp · 332b207d

Xiao Guangrong authored May 15, 2011

Simply return from kvm_mmu_pte_write path if no shadow page is
write-protected, then we can avoid to walk all shadow pages and hold
mmu-lock
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

332b207d

KVM: VMX: always_inline VMREADs · 96304217

Avi Kivity authored May 15, 2011

vmcs_readl() and friends are really short, but gcc thinks they are long because of
the out-of-line exception handlers.  Mark them always_inline to clear the
misunderstanding.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

96304217

KVM: VMX: Move VMREAD cleanup to exception handler · 5e520e62

Avi Kivity authored May 15, 2011

We clean up a failed VMREAD by clearing the output register.  Do
it in the exception handler instead of unconditionally.  This is
worthwhile since there are more than a hundred call sites.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5e520e62

KVM: x86 emulator: Stop passing ctxt->ops as arg of emul functions · 7b105ca2

Takuya Yoshikawa authored May 15, 2011

Dereference it in the actual users.

This not only cleans up the emulator but also makes it easy to convert
the old emulation functions to the new em_xxx() form later.

Note: Remove some inline keywords to let the compiler decide inlining.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7b105ca2

KVM: x86 emulator: Stop passing ctxt->ops as arg of decode helpers · ef5d75cc

Takuya Yoshikawa authored May 15, 2011

Dereference it in the actual users: only do_insn_fetch_byte().

This is consistent with the way __linearize() dereferences it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ef5d75cc

KVM: x86 emulator: Place insn_fetch helpers together · 67cbc90d

Takuya Yoshikawa authored May 15, 2011

The two macros need special care to use:
  Assume rc, ctxt, ops and done exist outside of them.
  Can goto outside.

Considering the fact that these are used only in decode functions,
moving these right after do_insn_fetch() seems to be a right thing
to improve the readability.

We also rename do_fetch_insn_byte() to do_insn_fetch_byte() to be
consistent.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

67cbc90d

KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl · e7677933

Avi Kivity authored May 11, 2011

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e7677933

07 Jul, 2011 1 commit
- Merge branch 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6 · 4dd1b49c
  Linus Torvalds authored Jul 06, 2011
```
* 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
  gpio: tps65910: add missing breaks in tps65910_gpio_init
```
  4dd1b49c
06 Jul, 2011 7 commits

Documentation: fix cgroup blkio throttle filenames · 9b61fc4c

Andrea Righi authored Jul 06, 2011

All the blkio.throttle.* file names are incorrectly reported without
".throttle" in the documentation. Fix it.
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9b61fc4c

Documentation: update CodingStyle memory allocators · 316b3799

Jesper Juhl authored Jul 06, 2011

The list of available general purpose memory allocators in
Documentation/CodingStyle chapter 14 is incomplete. This patch adds
the missing vzalloc() to the list.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

316b3799

MAINTAINERS: move kernel-doc patches location · 0dcb6d73

Randy Dunlap authored Jul 06, 2011

Move location of quilt series for kernel-doc patches.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0dcb6d73

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 · de3796e7

Linus Torvalds authored Jul 06, 2011

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (46 commits)
  [media] rc: call input_sync after scancode reports
  [media] imon: allow either proto on unknown 0xffdc
  [media] imon: auto-config ffdc 7e device
  [media] saa7134: fix raw IR timeout value
  [media] rc: fix ghost keypresses with certain hw
  [media] [staging] lirc_serial: allocate irq at init time
  [media] lirc_zilog: fix spinning rx thread
  [media] keymaps: fix table for pinnacle pctv hd devices
  [media] ite-cir: 8709 needs to use pnp resource 2
  [media] V4L: mx1-camera: fix uninitialized variable
  [media] omap_vout: Added check in reqbuf & mmap for buf_size allocation
  [media] OMAP_VOUT: Change hardcoded device node number to -1
  [media] OMAP_VOUTLIB: Fix wrong resizer calculation
  [media] uvcvideo: Disable the queue when failing to start
  [media] uvcvideo: Remove buffers from the queues when freeing
  [media] uvcvideo: Ignore entities for terminals with no supported format
  [media] v4l: Don't access media entity after is has been destroyed
  [media] media: omap3isp: fix a potential NULL deref
  [media] media: vb2: fix allocation failure check
  [media] media: vb2: reset queued_count value during queue reinitialization
  ...

Fix up trivial conflict in MAINTAINERS as per Mauro

de3796e7

FDPIC: Fix memory leak · bcb65a79

Davidlohr Bueso authored Jul 06, 2011

The shdr4extnum variable isn't being freed in the cleanup process of
elf_fdpic_core_dump().
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bcb65a79

resource: ability to resize an allocated resource · 23c570a6

Ram Pai authored Jul 05, 2011

Provides the ability to resize a resource that is already allocated.
This functionality is put in place to support reallocation needs of
pci resources.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

23c570a6

fs: fix lock initialization · a51cb91d

Miklos Szeredi authored Jul 06, 2011

locks_alloc_lock() assumed that the allocated struct file_lock is
already initialized to zero members.  This is only true for the first
allocation of the structure, after reuse some of the members will have
random values.

This will for example result in passing random fl_start values to
userspace in fuse for FL_FLOCK locks, which is an information leak at
best.

Fix by reinitializing those members which may be non-zero after freeing.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a51cb91d