Commits · 6ae1574c2a24eec5efa8bac305a8f87c839acc64 · Kirill Smelkov / linux

22 Jun, 2017 3 commits

KVM: s390: implement instruction execution protection for emulated · 6ae1574c

Christian Borntraeger authored Jun 07, 2017

ifetch

While currently only used to fetch the original instruction on failure
for getting the instruction length code, we should make the page table
walking code future proof.
Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

6ae1574c

KVM: s390: ioctls to get and set guest storage attributes · 4036e387

Claudio Imbrenda authored Aug 04, 2016

* Add the struct used in the ioctls to get and set CMMA attributes.
* Add the two functions needed to get and set the CMMA attributes for
  guest pages.
* Add the two ioctls that use the aforementioned functions.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

4036e387

KVM: s390: CMMA tracking, ESSA emulation, migration mode · 190df4a2

Claudio Imbrenda authored Aug 04, 2016

* Add a migration state bitmap to keep track of which pages have dirty
  CMMA information.
* Disable CMMA by default, so we can track if it's used or not. Enable
  it on first use like we do for storage keys (unless we are doing a
  migration).
* Creates a VM attribute to enter and leave migration mode.
* In migration mode, CMMA is disabled in the SIE block, so ESSA is
  always interpreted and emulated in software.
* Free the migration state on VM destroy.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

190df4a2

08 Jun, 2017 17 commits

tools/kvm_stat: display guest list in pid/guest selection screens · 865279c5

Stefan Raspl authored Jun 07, 2017

Display a (possibly inaccurate) list of all running guests. Note that we
leave a bit of extra room above the list for potential error messages.
Furthermore, we deliberately do not reject pids or guest names that are
not in our list, as we cannot rule out that our fuzzy approach might be
in error somehow.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

865279c5

tools/kvm_stat: add new interactive command 'o' · 6667ae8f

Stefan Raspl authored Jun 07, 2017

Add new interactive command 'o' to toggle sorting by 'CurAvg/s' (default)
and 'Total' columns.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6667ae8f

tools/kvm_stat: add new interactive command 's' · 64eefad2

Stefan Raspl authored Jun 07, 2017

Add new command 's' to modify the update interval. Limited to a maximum of
25.5 sec and a minimum of 0.1 sec, since curses cannot handle longer
and shorter delays respectively.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

64eefad2

tools/kvm_stat: add new interactive command 'h' · 1fdea7b2

Stefan Raspl authored Jun 07, 2017

Display interactive commands reference on 'h'.
While at it, sort interactive commands alphabetically in various places.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1fdea7b2

tools/kvm_stat: rename 'Current' column to 'CurAvg/s' · 38e89c37

Stefan Raspl authored Jun 07, 2017

'Current' can be misleading as it doesn't tell whether this is the amount
of events in the last interval or the current average per second.
Note that this necessitates widening the respective column by one more
character.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

38e89c37

tools/kvm_stat: make heading look a bit more like 'top' · f6d75310

Stefan Raspl authored Jun 07, 2017

Print header in standout font just like the 'top' command does.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f6d75310

tools/kvm_stat: display message indicating lack of events · 57253937

Stefan Raspl authored Jun 07, 2017

Give users some indication on the reason why no data is displayed on the
screen yet.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

57253937

tools/kvm_stat: show cursor in selection screens · 62d1b6cc

Stefan Raspl authored Jun 07, 2017

Show the cursor in the interactive screens to specify pid, filter or guest
name as an orientation for the user.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

62d1b6cc

tools/kvm_stat: move functions to corresponding classes · 099a2dfc

Stefan Raspl authored Jun 07, 2017

Quite a few of the functions are used only in a single class. Moving
functions accordingly to improve the overall structure.
Furthermore, introduce a base class for the providers, which might also
come handy for future extensions.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

099a2dfc

tools/kvm_stat: simplify initializers · c469117d

Stefan Raspl authored Jun 07, 2017

Simplify a couple of initialization routines:
* TracepointProvider and DebugfsProvider: Pass pid into __init__() instead
  of switching to the requested value in an extra call after initializing
  to the default first.
* Pass a single options object into Stats.__init__(), delaying options
  evaluation accordingly, instead of evaluating options first and passing
  several parts of the options object to Stats.__init__() individually.
* Eliminate Stats.update_provider_pid(), since this 2-line function is now
  used in a single place only.
* Remove extra call to update_drilldown() in Tui.__init__() by getting the
  value of options.fields right initially when parsing options.
* Simplify get_providers() logic.
* Avoid duplicate fields initialization by handling it once in the
  providers' __init__() methods.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c469117d

tools/kvm_stat: remove extra statement · 5e3823a4

Stefan Raspl authored Jun 07, 2017

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5e3823a4

tools/kvm_stat: removed unused function · 42a947b7

Stefan Raspl authored Jun 07, 2017

Function available_fields() is not used in any place.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

42a947b7

tools/kvm_stat: simplify line print logic · 5a7d11f8

Stefan Raspl authored Jun 07, 2017

Simplify line print logic for header and data lines in interactive mode
as previously suggested by Radim.
While at it, add a space between the first two columns to avoid the
total bleeding into the event name.
Furthermore, for column 'Current', differentiate between no events being
reported (empty 'Current' column) vs the case where events were reported
but the average was rounded down to zero ('0' in 'Current column), for
the folks who appreciate the difference.
Finally: Only skip events which were not reported at all yet, instead of
events that don't have a value in the current interval.
Considered using constants for the field widths in the format strings.
However, that would make things a bit more complicated, and considering
that there are only two places where output happens, I figured it isn't
worth the trouble.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5a7d11f8

tools/kvm_stat: remove unnecessary header redraws · 2da9d4aa

Stefan Raspl authored Jun 07, 2017

Certain interactive commands will not modify any information displayed in
the header, hence we can skip them.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2da9d4aa

tools/kvm_stat: fix undue use of initial sleeptime · 81468d73

Stefan Raspl authored Jun 07, 2017

We should not use the initial sleeptime for any key press that does not
switch to a different screen, as that introduces an unaesthetic flicker due
to two updates in quick succession.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81468d73

tools/kvm_stat: fix event counts display for interrupted intervals · 124c2fc9

Stefan Raspl authored Jun 07, 2017

When an update interval is interrupted via key press (e.g. space), the
'Current' column value is calculated using the full interval length
instead of the elapsed time, which leads to lower than actual numbers.
Furthermore, the value should be rounded, not truncated.
This is fixed by using the actual elapsed time for the calculation.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

124c2fc9

tools/kvm_stat: fix typo · 773bffee

Stefan Raspl authored Jun 07, 2017

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

773bffee

07 Jun, 2017 5 commits

KVM: nVMX: Update vmcs12->guest_linear_address on nested VM-exit · d281e13b

Jim Mattson authored Jun 01, 2017

The guest-linear address field is set for VM exits due to attempts to
execute LMSW with a memory operand and VM exits due to attempts to
execute INS or OUTS for which the relevant segment is usable,
regardless of whether or not EPT is in use.

Fixes: 119a9c01 ("KVM: nVMX: pass valid guest linear-address to the L1")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

d281e13b

KVM: nVMX: Don't update vmcs12->xss_exit_bitmap on nested VM-exit · d923fcf6

Jim Mattson authored Jun 01, 2017

The XSS-exiting bitmap is a VMCS control field that does not change
while the CPU is in non-root mode. Transferring the unchanged value
from vmcs02 to vmcs12 is unnecessary.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

d923fcf6

kvm: vmx: Check value written to IA32_BNDCFGS · 4531662d

Jim Mattson authored May 23, 2017

Bits 11:2 must be zero and the linear addess in bits 63:12 must be
canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP.

Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

4531662d

kvm: x86: Guest BNDCFGS requires guest MPX support · 4439af9f

Jim Mattson authored May 24, 2017

The BNDCFGS MSR should only be exposed to the guest if the guest
supports MPX. (cf. the TSC_AUX MSR and RDTSCP.)

Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save")
Change-Id: I3ad7c01bda616715137ceac878f3fa7e66b6b387
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

4439af9f

kvm: vmx: Do not disable intercepts for BNDCFGS · a8b6fda3

Jim Mattson authored May 23, 2017

The MSR permission bitmaps are shared by all VMs. However, some VMs
may not be configured to support MPX, even when the host does. If the
host supports VMX and the guest does not, we should intercept accesses
to the BNDCFGS MSR, so that we can synthesize a #GP
fault. Furthermore, if the host does not support MPX and the
"ignore_msrs" kvm kernel parameter is set, then we should intercept
accesses to the BNDCFGS MSR, so that we can skip over the rdmsr/wrmsr
without raising a #GP fault.

Fixes: da8999d3 ("KVM: x86: Intel MPX vmx and msr handle")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

a8b6fda3

01 Jun, 2017 5 commits

KVM: x86: avoid large stack allocations in em_fxrstor · 9d643f63

Nick Desaulniers authored May 30, 2017

em_fxstor previously called fxstor_fixup.  Both created instances of
struct fxregs_state on the stack, which triggered the warning:

arch/x86/kvm/emulate.c:4018:12: warning: stack frame size of 1080 bytes
in function
      'em_fxrstor' [-Wframe-larger-than=]
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
           ^
with CONFIG_FRAME_WARN set to 1024.

This patch does the fixup in em_fxstor now, avoiding one additional
struct fxregs_state, and now fxstor_fixup can be removed as it has no
other call sites.

Further, the calculation for offsets into xmm_space can be shared
between em_fxstor and em_fxsave.
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
[Clean up calculation of offsets and fix it for 64-bit mode. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9d643f63

KVM: white space cleanup in nested_vmx_setup_ctls_msrs() · 7461fbc4

Dan Carpenter authored May 18, 2017

This should have been indented one more character over and it should use
tabs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

7461fbc4

KVM: Tidy the whitespace in nested_svm_check_permissions() · e9196ceb

Dan Carpenter authored May 18, 2017

I moved the || to the line before.  Also I replaced some spaces with a
tab on the "return 0;" line.  It looks OK in the diff but originally
that line was only indented 7 spaces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

e9196ceb

KVM: x86: Fix nmi injection failure when vcpu got blocked · 47a66eed

ZhuangYanying authored May 26, 2017

When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
other than the lock-holding one, would enter into S state because of
pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
not be injected into vm.

The reason is:
1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
cpu->kvm_vcpu_dirty is true.

It's not enough just to check nmi_queued to decide whether to stay in
vcpu_block() or not. NMI should be injected immediately at any situation.
Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
in vm_vcpu_has_events().

Do the same change for SMIs.
Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

47a66eed

KVM: SVM: do not zero out segment attributes if segment is unusable or not present · d9c1b543

Roman Pen authored Jun 01, 2017

This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack.  The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET.
The corresponding work around for the kernel is the following:

61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")

In other turn virtualization side treated unusable segment incorrectly and
restored CPL from SS attributes, which were zeroed out few lines above.

In current patch it is assured only that P bit is cleared in VMCB.save state
and segment attributes are not zeroed out if segment is not presented or is
unusable, therefore CPL can be safely restored from DPL field.

This is only one part of the fix, since QEMU side should be fixed accordingly
not to zero out attributes on its side.  Corresponding patch will follow.

[1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d9c1b543

30 May, 2017 2 commits

KVM: SVM: ignore type when setting segment registers · 8eae9570

Gioh Kim authored May 30, 2017

Commit 19bca6ab ("KVM: SVM: Fix cross vendor migration issue with
unusable bit") added checking type when setting unusable.
So unusable can be set if present is 0 OR type is 0.
According to the AMD processor manual, long mode ignores the type value
in segment descriptor. And type can be 0 if it is read-only data segment.
Therefore type value is not related to unusable flag.

This patch is based on linux-next v4.12.0-rc3.
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8eae9570

KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging · cbf71279

Radim Krčmář authored May 19, 2017

kvm_skip_emulated_instruction() will return 0 if userspace is
single-stepping the guest.

kvm_skip_emulated_instruction() uses return status convention of exit
handler: 0 means "exit to userspace" and 1 means "continue vm entries".
The problem is that nested_vmx_check_vmptr() return status means
something else: 0 is ok, 1 is error.

This means we would continue executing after a failure.  Static checker
noticed it because vmptr was not initialized.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6affcbed ("KVM: x86: Add kvm_skip_emulated_instruction and use it.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cbf71279

26 May, 2017 3 commits

KVM: x86: Fix virtual wire mode · 52b54190

Jan H. Schönherr authored May 20, 2017

Intel SDM says, that at most one LAPIC should be configured with ExtINT
delivery. KVM configures all LAPICs this way. This causes pic_unlock()
to kick the first available vCPU from the internal KVM data structures.
If this vCPU is not the BSP, but some not-yet-booted AP, the BSP may
never realize that there is an interrupt.

Fix that by enabling ExtINT delivery only for the BSP.

This allows booting a Linux guest without a TSC in the above situation.
Otherwise the BSP gets stuck in calibrate_delay_converge().
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

52b54190

KVM: nVMX: Fix handling of lmsw instruction · e1d39b17

Jan H. Schönherr authored May 20, 2017

The decision whether or not to exit from L2 to L1 on an lmsw instruction is
based on bogus values: instead of using the information encoded within the
exit qualification, it uses the data also used for the mov-to-cr
instruction, which boils down to using whatever is in %eax at that point.

Use the correct values instead.

Without this fix, an L1 may not get notified when a 32-bit Linux L2
switches its secondary CPUs to protected mode; the L1 is only notified on
the next modification of CR0. This short time window poses a problem, when
there is some other reason to exit to L1 in between. Then, L2 will be
resumed in real mode and chaos ensues.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e1d39b17

KVM: X86: Fix preempt the preemption timer cancel · 5acc1ca4

Wanpeng Li authored May 20, 2017

Preemption can occur during cancel preemption timer, and there will be
inconsistent status in lapic, vmx and vmcs field.

          CPU0                    CPU1

  preemption timer vmexit
  handle_preemption_timer(vCPU0)
    kvm_lapic_expired_hv_timer
      vmx_cancel_hv_timer
        vmx->hv_deadline_tsc = -1
        vmcs_clear_bits
        /* hv_timer_in_use still true */
  sched_out
                           sched_in
                           kvm_arch_vcpu_load
                             vmx_set_hv_timer
                               write vmx->hv_deadline_tsc
                               vmcs_set_bits
                           /* back in kvm_lapic_expired_hv_timer */
                           hv_timer_in_use = false
                           ...
                           vmx_vcpu_run
                             vmx_arm_hv_run
                               write preemption timer deadline
                             spurious preemption timer vmexit
                               handle_preemption_timer(vCPU0)
                                 kvm_lapic_expired_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);

This can be reproduced sporadically during boot of L2 on a
preemptible L1, causing a splat on L1.

 WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
 CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
  Call Trace:
  handle_preemption_timer+0xe/0x20 [kvm_intel]
  vmx_handle_exit+0xc9/0x15f0 [kvm_intel]
  ? lock_acquire+0xdb/0x250
  ? lock_acquire+0xdb/0x250
  ? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm]
  kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x8f/0x750
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes it by disabling preemption while cancelling
preemption timer.  This way cancel_hv_timer is atomic with
respect to kvm_arch_vcpu_load.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5acc1ca4

22 May, 2017 2 commits

Linux 4.12-rc2 · 08332893
Linus Torvalds authored May 21, 2017

08332893

x86: fix 32-bit case of __get_user_asm_u64() · 33c9e972

Linus Torvalds authored May 21, 2017

The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").

Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.

The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.

There were two independent bugs in __get_user_asm_u64():

 - it still did the STAC/CLAC user space access marking, even though
   that is now done by the wrapper macros, see commit 11f1a4b9
   ("x86: reorganize SMAP handling in user space accesses").

   This didn't result in a semantic error, it just means that the
   inlined optimized version was hugely less efficient than the
   allegedly slower standard version, since the CLAC/STAC overhead is
   quite high on modern Intel CPU's.

 - the double register %eax/%edx was marked as an output, but the %eax
   part of it was touched early in the asm, and could thus clobber other
   inputs to the asm that gcc didn't expect it to touch.

   In particular, that meant that the generated code could look like
   this:

        mov    (%eax),%eax
        mov    0x4(%eax),%edx

   where the load of %edx obviously was _supposed_ to be from the 32-bit
   word that followed the source of %eax, but because %eax was
   overwritten by the first instruction, the source of %edx was
   basically random garbage.

The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org   # v4.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

33c9e972

21 May, 2017 3 commits

Clean up x86 unsafe_get/put_user() type handling · 334a023e

Linus Torvalds authored May 21, 2017

Al noticed that unsafe_put_user() had type problems, and fixed them in
commit a7cc722f ("fix unsafe_put_user()"), which made me look more
at those functions.

It turns out that unsafe_get_user() had a type issue too: it limited the
largest size of the type it could handle to "unsigned long".  Which is
fine with the current users, but doesn't match our existing normal
get_user() semantics, which can also handle "u64" even when that does
not fit in a long.

While at it, also clean up the type cast in unsafe_put_user().  We
actually want to just make it an assignment to the expected type of the
pointer, because we actually do want warnings from types that don't
convert silently.  And it makes the code more readable by not having
that one very long and complex line.

[ This patch might become stable material if we ever end up back-porting
  any new users of the unsafe uaccess code, but as things stand now this
  doesn't matter for any current existing uses. ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

334a023e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f3926e4c

Linus Torvalds authored May 21, 2017

Pull misc uaccess fixes from Al Viro:
 "Fix for unsafe_put_user() (no callers currently in mainline, but
  anyone starting to use it will step into that) + alpha osf_wait4()
  infoleak fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  osf_wait4(): fix infoleak
  fix unsafe_put_user()

f3926e4c

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 970c305a

Linus Torvalds authored May 21, 2017

Pull scheduler fix from Thomas Gleixner:
 "A single scheduler fix:

  Prevent idle task from ever being preempted. That makes sure that
  synchronize_rcu_tasks() which is ignoring idle task does not pretend
  that no task is stuck in preempted state. If that happens and idle was
  preempted on a ftrace trampoline the machine crashes due to
  inconsistent state"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Call __schedule() from do_idle() without enabling preemption

970c305a