Commits · cfaa790a3fb8a7efa98f4a6457e19dc3a0db35d3 · Kirill Smelkov / linux

21 Jan, 2015 1 commit

kvm: Fix CR3_PCID_INVD type on 32-bit · cfaa790a

Borislav Petkov authored Jan 15, 2015

arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
    rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;

happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
to the left.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cfaa790a

20 Jan, 2015 2 commits

KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue · 54750f2c

Marcelo Tosatti authored Jan 20, 2015

SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
and rdtsc is larger than a given threshold:

 * If we get more than the below threshold into the future, we rerequest
 * the real time from the host again which has only little offset then
 * that we need to adjust using the TSC.
 *
 * For now that threshold is 1/5th of a jiffie. That should be good
 * enough accuracy for completely broken systems, but also give us swing
 * to not call out to the host all the time.
 */
#define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)

Disable masterclock support (which increases said delta) in case the
boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Upstreams kernels which support pvclock vsyscalls (and therefore make
use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

54750f2c

KVM: fix "Should it be static?" warnings from sparse · 69b0049a

Fengguang Wu authored Jan 19, 2015

arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static?
arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static?
arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static?
arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static?
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

69b0049a

19 Jan, 2015 2 commits

Optimize TLB flush in kvm_mmu_slot_remove_write_access. · d91ffee9

Kai Huang authored Jan 12, 2015

No TLB flush is needed when there's no valid rmap in memory slot.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d91ffee9

x86: kvm: vmx: Remove some unused functions · 0c55d6d9

Rickard Strandqvist authored Jan 11, 2015

Removes some functions that are not used anywhere:
cpu_has_vmx_eptp_writeback() cpu_has_vmx_eptp_uncacheable()

This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0c55d6d9

09 Jan, 2015 3 commits

KVM: x86: #PF error-code on R/W operations is wrong · c205fb7d

Nadav Amit authored Dec 25, 2014

When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c205fb7d

KVM: x86: flush TLB when D bit is manually changed. · 7e71a59b

Kai Huang authored Jan 09, 2015

When software changes D bit (either from 1 to 0, or 0 to 1), the
corresponding TLB entity in the hardware won't be updated immediately. We
should flush it to guarantee the consistence of D bit between TLB and
MMU page table in memory.  This is especially important when clearing
the D bit, since it may cause false negatives in reporting dirtiness.

Sanity test was done on my machine with Intel processor.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
[Check A bit too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7e71a59b

KVM: x86: allow TSC deadline timer on all hosts · defcf51f

Radim Krčmář authored Jan 08, 2015

Emulation does not utilize the feature.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

defcf51f

08 Jan, 2015 20 commits

kvm: x86: Remove kvm_make_request from lapic.c · bab5bb39

Nicholas Krause authored Jan 01, 2015

Adds a function kvm_vcpu_set_pending_timer instead of calling
kvm_make_request in lapic.c.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bab5bb39

KVM: x86: Access to LDT/GDT that wraparound is incorrect · edccda7c

Nadav Amit authored Dec 25, 2014

When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit.  Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Wrap 64-bit check with #ifdef CONFIG_X86_64, to avoid a "right shift count
 >= width of type" warning and consequent undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

edccda7c

KVM: x86: Do not set access bit on accessed segments · e2cefa74

Nadav Amit authored Dec 25, 2014

When segment is loaded, the segment access bit is set unconditionally. In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e2cefa74

KVM: x86: POP [ESP] is not emulated correctly · ab708099

Nadav Amit authored Dec 25, 2014

According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ab708099

KVM: x86: em_call_far should return failure result · 80976dbb

Nadav Amit authored Dec 25, 2014

Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

80976dbb

KVM: x86: JMP/CALL using call- or task-gate causes exception · 3dc4bc4f

Nadav Amit authored Dec 25, 2014

The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate.  This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3dc4bc4f

KVM: x86: fnstcw and fnstsw may cause spurious exception · 16bebefe

Nadav Amit authored Dec 25, 2014

Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

16bebefe

KVM: x86: pop sreg accesses only 2 bytes · 3313bc4e

Nadav Amit authored Dec 25, 2014

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read. The current behavior may result in incorrect #GP or #PF exceptions.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3313bc4e

KVM: x86: mmu: replace assertions with MMU_WARN_ON, a conditional WARN_ON · fa4a2c08

Paolo Bonzini authored Oct 02, 2013

This makes the direction of the conditions consistent with code that
is already using WARN_ON.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fa4a2c08

KVM: x86: mmu: remove ASSERT(vcpu) · 4c1a50de

Paolo Bonzini authored Oct 02, 2013

Because ASSERT is just a printk, these would oops right away.
The assertion thus hardly adds anything.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4c1a50de

KVM: x86: mmu: remove argument to kvm_init_shadow_mmu and kvm_init_shadow_ept_mmu · ad896af0

Paolo Bonzini authored Oct 02, 2013

The initialization function in mmu.c can always use walk_mmu, which
is known to be vcpu->arch.mmu.  Only init_kvm_nested_mmu is used to
initialize vcpu->arch.nested_mmu.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ad896af0

KVM: x86: mmu: do not use return to tail-call functions that return void · e0c6db3e
Paolo Bonzini authored Dec 23, 2014
```
This is, pedantically, not valid C.  It also looks weird.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
e0c6db3e

KVM: x86: add tracepoint to wait_lapic_expire · 6c19b753

Marcelo Tosatti authored Dec 16, 2014

Add tracepoint to wait_lapic_expire.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[Remind reader if early or late. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6c19b753

KVM: x86: add option to advance tscdeadline hrtimer expiration · d0659d94

Marcelo Tosatti authored Dec 16, 2014

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario
which requires strict timing regarding timer expiration).

Reduces average cyclictest latency from 12us to 8us
on Core i5 desktop.

Note: this option requires tuning to find the appropriate value
for a particular hardware/guest combination. One method is to measure the
average delay between apic_timer_fn and VM-entry.
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d0659d94

KVM: x86: add method to test PIR bitmap vector · 7c6a98df

Marcelo Tosatti authored Dec 16, 2014

kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7c6a98df

kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv · b4eef9b3

Tiejun Chen authored Dec 22, 2014

In most cases calling hwapic_isr_update(), we always check if
kvm_apic_vid_enabled() == 1, but actually,
kvm_apic_vid_enabled()
    -> kvm_x86_ops->vm_has_apicv()
        -> vmx_vm_has_apicv() or '0' in svm case
            -> return enable_apicv && irqchip_in_kernel(kvm)

So its a little cost to recall vmx_vm_has_apicv() inside
hwapic_isr_update(), here just NULL out hwapic_isr_update() in
case of !enable_apicv inside hardware_setup() then make all
related stuffs follow this. Note we don't check this under that
condition of irqchip_in_kernel() since we should make sure
definitely any caller don't work  without in-kernel irqchip.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b4eef9b3

KVM: x86: Remove FIXMEs in emulate.c for the function,task_switch_32 · 5ff22e7e

Nicholas Krause authored Dec 18, 2014

Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5ff22e7e

KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit · 19d5f10b

Eugene Korenevsky authored Dec 16, 2014

When generating #PF VM-exit, check equality:
(PFEC & PFEC_MASK) == PFEC_MATCH
If there is equality, the 14 bit of exception bitmap is used to take decision
about generating #PF VM-exit. If there is inequality, inverted 14 bit is used.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

19d5f10b

KVM: nVMX: Improve nested msr switch checking · e9ac033e

Eugene Korenevsky authored Dec 11, 2014

This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e9ac033e

KVM: nVMX: Add nested msr load/restore algorithm · ff651cb6

Wincy Van authored Dec 11, 2014

Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ff651cb6

06 Jan, 2015 1 commit
- Linux 3.19-rc3 · b1940cd2
  Linus Torvalds authored Jan 05, 2015
  
  b1940cd2
05 Jan, 2015 3 commits

Merge tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 79b8cb97

Linus Torvalds authored Jan 05, 2015

Pull powerpc fixes from Michael Ellerman:

 - Wire up sys_execveat(). Tested on 32 & 64 bit.

 - Fix for kdump on LE systems with cpus hot unplugged.

 - Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
   broke other platforms, we'll do a proper fix for 3.20.

* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
  powerpc/kdump: Ignore failure in enabling big endian exception during crash
  powerpc: Wire up sys_execveat() syscall

79b8cb97

Merge tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · f40bde85

Linus Torvalds authored Jan 05, 2015

Pull ia64 fixlet from Tony Luck:
 "Add execveat syscall"

* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Enable execveat syscall for ia64

f40bde85

[IA64] Enable execveat syscall for ia64 · b739896d

Tony Luck authored Jan 05, 2015

See commit 51f39a1f
    syscalls: implement execveat() system call
Signed-off-by: Tony Luck <tony.luck@intel.com>

b739896d

04 Jan, 2015 4 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 693a30b8

Linus Torvalds authored Jan 04, 2015

Pull UML fixes from Richard Weinberger:
 "Two fixes for UML regressions. Nothing exciting"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test

693a30b8

Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo" · 4bf9636c

Pavel Machek authored Jan 04, 2015

Commit 9fc2105a ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4bf9636c

x86, um: actually mark system call tables readonly · b485342b

Daniel Borkmann authored Jan 03, 2015

Commit a074335a ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

b485342b

um: Skip futex_atomic_cmpxchg_inatomic() test · f911d731

Richard Weinberger authored Dec 10, 2014

futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>

f911d731

02 Jan, 2015 3 commits

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · d753856c

Linus Torvalds authored Jan 02, 2015

Pull SCSI fixes from James Bottomley:
 "This is a set of three fixes: one to correct an abort path thinko
  causing failures (and a panic) in USB on device misbehaviour, One to
  fix an out of order issue in the fnic driver and one to match discard
  expectations to qemu which otherwise cause Linux to behave badly as a
  guest"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  SCSI: fix regression in scsi_send_eh_cmnd()
  fnic: IOMMU Fault occurs when IO and abort IO is out of order
  sd: tweak discard heuristics to work around QEMU SCSI issue

d753856c

Merge tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 6a4bfa7c

Linus Torvalds authored Jan 02, 2015

Pull sound fixes from Takashi Iwai:
 "Nothing too exciting as a new year's start here: most of fixes are for
  ASoC, a boot crash fix on OMAP for deferred probe, a few driver
  specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
  fixes in kerneldoc comments for PCM"

* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Fix kerneldoc for params_*() functions
  ASoC: rockchip: i2s: fix maxburst of dma data to 4
  ASoC: rockchip: i2s: fix error defination of transmit data level
  ASoC: Intel: correct the fixed free block allocation
  ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
  ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
  ASoC: Intel: Fix BYTCR firmware name
  ASoC: dwc: Iterate over all channels
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: Intel: Add I2C dependency to two new machines
  ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe

6a4bfa7c

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · d7e19bd8

Linus Torvalds authored Jan 02, 2015

Pull vhost cleanup and virtio bugfix
 "There's a single change here, fixing a vhost bug where vhost
  initialization fails due to used ring alignment check being too
  strict"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: relax used address alignment
  virtio_ring: document alignment requirements

d7e19bd8

31 Dec, 2014 1 commit

Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit · 5e0f872c

Linus Torvalds authored Dec 31, 2014

Pull audit fix from Paul Moore:
 "One audit patch to resolve a panic/oops when recording filenames in
  the audit log, see the mail archive link below.

  The fix isn't as nice as I would like, as it involves an allocate/copy
  of the filename, but it solves the problem and the overhead should
  only affect users who have configured audit rules involving file
  names.

  We'll revisit this issue with future kernels in an attempt to make
  this suck less, but in the meantime I think this fix should go into
  the next release of v3.19-rcX.

  [ https://marc.info/?t=141986927600001&r=1&w=2 ]"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: create private file name copies when auditing inodes

5e0f872c