1. 27 Nov, 2018 11 commits
    • Liran Alon's avatar
      KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD · 72aeb60c
      Liran Alon authored
      According to TLFS section 16.11.2 Enlightened VMCS, the first u32
      field of eVMCS should specify eVMCS VersionNumber.
      
      This version should be in the range of supported eVMCS versions exposed
      to guest via CPUID.0x4000000A.EAX[0:15].
      The range which KVM expose to guest in this CPUID field should be the
      same as the value returned in vmcs_version by nested_enable_evmcs().
      
      According to the above, eVMCS VMPTRLD should verify that version specified
      in given eVMCS is in the supported range. However, current code
      mistakenly verfies this field against VMCS12_REVISION.
      
      One can also see that when KVM use eVMCS, it makes sure that
      alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION.
      
      Obvious fix should just change eVMCS VMPTRLD to verify first u32 field
      of eVMCS is equal to KVM_EVMCS_VERSION.
      However, it turns out that Microsoft Hyper-V fails to comply to their
      own invented interface: When Hyper-V use eVMCS, it just sets first u32
      field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our
      case: VMCS12_REVISION). Instead of used eVMCS version number which is
      one of the supported versions specified in CPUID.0x4000000A.EAX[0:15].
      To overcome Hyper-V bug, we accept either a supported eVMCS version
      or VMCS12_REVISION as valid values for first u32 field of eVMCS.
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      72aeb60c
    • Leonid Shatz's avatar
      KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset · 326e7425
      Leonid Shatz authored
      Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to
      represent the running guest"), vcpu->arch.tsc_offset meaning was
      changed to always reflect the tsc_offset value set on active VMCS.
      Regardless if vCPU is currently running L1 or L2.
      
      However, above mentioned commit failed to also change
      kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
      This is because vmx_write_tsc_offset() could set the tsc_offset value
      in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
      However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
      to given offset parameter. Without taking into account the possible
      addition of vmcs12->tsc_offset. (Same is true for SVM case).
      
      Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
      actually set tsc_offset in active VMCS and modify
      kvm_vcpu_write_tsc_offset() to set returned value in
      vcpu->arch.tsc_offset.
      In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
      to make it clear that it is meant to set L1 TSC offset.
      
      Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: default avatarLeonid Shatz <leonid.shatz@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      326e7425
    • Yi Wang's avatar
      x86/kvm/vmx: fix old-style function declaration · 1e4329ee
      Yi Wang authored
      The inline keyword which is not at the beginning of the function
      declaration may trigger the following build warnings, so let's fix it:
      
      arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1e4329ee
    • Yi Wang's avatar
      KVM: x86: fix empty-body warnings · 354cb410
      Yi Wang authored
      We get the following warnings about empty statements when building
      with 'W=1':
      
      arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      
      Rework the debug helper macro to get rid of these warnings.
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      354cb410
    • Liran Alon's avatar
      KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes · f48b4711
      Liran Alon authored
      When guest transitions from/to long-mode by modifying MSR_EFER.LMA,
      the list of shared MSRs to be saved/restored on guest<->host
      transitions is updated (See vmx_set_efer() call to setup_msrs()).
      
      On every entry to guest, vcpu_enter_guest() calls
      vmx_prepare_switch_to_guest(). This function should also take care
      of setting the shared MSRs to be saved/restored. However, the
      function does nothing in case we are already running with loaded
      guest state (vmx->loaded_cpu_state != NULL).
      
      This means that even when guest modifies MSR_EFER.LMA which results
      in updating the list of shared MSRs, it isn't being taken into account
      by vmx_prepare_switch_to_guest() because it happens while we are
      running with loaded guest state.
      
      To fix above mentioned issue, add a flag to mark that the list of
      shared MSRs has been updated and modify vmx_prepare_switch_to_guest()
      to set shared MSRs when running with host state *OR* list of shared
      MSRs has been updated.
      
      Note that this issue was mistakenly introduced by commit
      678e315e ("KVM: vmx: add dedicated utility to access guest's
      kernel_gs_base") because previously vmx_set_efer() always called
      vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to
      set shared MSRs.
      
      Fixes: 678e315e ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base")
      Reported-by: default avatarEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f48b4711
    • Liran Alon's avatar
      KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall · bcbfbd8e
      Liran Alon authored
      kvm_pv_clock_pairing() allocates local var
      "struct kvm_clock_pairing clock_pairing" on stack and initializes
      all it's fields besides padding (clock_pairing.pad[]).
      
      Because clock_pairing var is written completely (including padding)
      to guest memory, failure to init struct padding results in kernel
      info-leak.
      
      Fix the issue by making sure to also init the padding with zeroes.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bcbfbd8e
    • Liran Alon's avatar
      KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once · 7f9ad1df
      Liran Alon authored
      Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice:
      1) kvm_vcpu_ioctl_enable_cap() is called to enable
      KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
      2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills
      vmcs_version which is then copied to userspace.
      3) kvm_vcpu_ioctl_enable_cap() is called again to enable
      KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
      4) This time nested_enable_evmcs() just returns 0 as
      enlightened_vmcs_enabled is already true. *Without filling
      vmcs_version*.
      5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies
      *uninitialized* vmcs_version to userspace which leads to kernel info-leak.
      
      Fix this issue by simply changing nested_enable_evmcs() to always fill
      vmcs_version output argument. Even when enlightened_vmcs_enabled is
      already set to true.
      
      Note that SVM's nested_enable_evmcs() should not be modified because it
      always returns a non-zero value (-ENODEV) which results in
      kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to
      userspace (as it should).
      
      Fixes: 57b119da ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability")
      Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7f9ad1df
    • Wei Wang's avatar
      svm: Add mutex_lock to protect apic_access_page_done on AMD systems · 30510387
      Wei Wang authored
      There is a race condition when accessing kvm->arch.apic_access_page_done.
      Due to it, x86_set_memory_region will fail when creating the second vcpu
      for a svm guest.
      
      Add a mutex_lock to serialize the accesses to apic_access_page_done.
      This lock is also used by vmx for the same purpose.
      Signed-off-by: default avatarWei Wang <wawei@amazon.de>
      Signed-off-by: default avatarAmadeusz Juskowiak <ajusk@amazon.de>
      Signed-off-by: default avatarJulian Stecklina <jsteckli@amazon.de>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Reviewed-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30510387
    • Wanpeng Li's avatar
      KVM: X86: Fix scan ioapic use-before-initialization · e97f852f
      Wanpeng Li authored
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e97f852f
    • Wanpeng Li's avatar
      KVM: LAPIC: Fix pv ipis use-before-initialization · 38ab012f
      Wanpeng Li authored
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
       PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
       Call Trace:
        kvm_emulate_hypercall+0x3cc/0x700 [kvm]
        handle_vmcall+0xe/0x10 [kvm_intel]
        vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
        vcpu_enter_guest+0x9fb/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the apic map has not yet been initialized, the testcase
      triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
      is dereferenced. This patch fixes it by checking whether or not apic map is
      NULL and bailing out immediately if that is the case.
      
      Fixes: 4180bf1b (KVM: X86: Implement "send IPI" hypercall)
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      38ab012f
    • Luiz Capitulino's avatar
      KVM: VMX: re-add ple_gap module parameter · a87c99e6
      Luiz Capitulino authored
      Apparently, the ple_gap parameter was accidentally removed
      by commit c8e88717. Add it
      back.
      Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: c8e88717Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a87c99e6
  2. 25 Nov, 2018 1 commit
  3. 18 Nov, 2018 23 commits
  4. 16 Nov, 2018 5 commits
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 1ce80e0f
      Linus Torvalds authored
      Pull fsnotify fix from Jan Kara:
       "One small fsnotify fix for duplicate events"
      
      * tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fanotify: fix handling of events on child sub-directory
      1ce80e0f
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · e6a2562f
      Linus Torvalds authored
      Pull bfs2 fixes from Andreas Gruenbacher:
       "Fix two bugs leading to leaked buffer head references:
      
         - gfs2: Put bitmap buffers in put_super
         - gfs2: Fix iomap buffer head reference counting bug
      
        And one bug leading to significant slow-downs when deleting large
        files:
      
         - gfs2: Fix metadata read-ahead during truncate (2)"
      
      * tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Fix iomap buffer head reference counting bug
        gfs2: Fix metadata read-ahead during truncate (2)
        gfs2: Put bitmap buffers in put_super
      e6a2562f
    • Andreas Gruenbacher's avatar
      gfs2: Fix iomap buffer head reference counting bug · c26b5aa8
      Andreas Gruenbacher authored
      GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
      gfs2_iomap_end in iomap->private.  It sets that private pointer in
      gfs2_iomap_get.  Users of gfs2_iomap_get other than gfs2_iomap_begin
      would have to release iomap->private, but this isn't done correctly,
      leading to a leak of buffer head references.
      
      To fix this, move the code for setting iomap->private from
      gfs2_iomap_get to gfs2_iomap_begin.
      
      Fixes: 64bc06bb ("gfs2: iomap buffered write support")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c26b5aa8
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 32e2524a
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes the following issues:
      
         - Potential memory overwrite in simd
      
         - Kernel info leaks in crypto_user
      
         - NULL dereference and use-after-free in hisilicon"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: user - Zeroize whole structure given to user space
        crypto: user - fix leaking uninitialized memory to userspace
        crypto: simd - correctly take reqsize of wrapped skcipher into account
        crypto: hisilicon - Fix reference after free of memories on error path
        crypto: hisilicon - Fix NULL dereference for same dst and src
      32e2524a
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2018-11-16' of git://anongit.freedesktop.org/drm/drm · 4efd3460
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Live from Vancouver, SoC maintainer talk, this weeks drm fixes pull
        for rc3:
      
        omapdrm:
         - regression fixes for the reordering bridge stuff that went into rc1
      
        i915:
         - incorrect EU count fix
         - HPD storm fix
         - MST fix
         - relocation fix for gen4/5
      
        amdgpu:
         - huge page handling fix
         - IH ring setup
         - XGMI aperture setup
         - watermark setup fix
      
        misc:
         - docs and MST fix"
      
      * tag 'drm-fixes-2018-11-16' of git://anongit.freedesktop.org/drm/drm: (23 commits)
        drm/i915: Account for scale factor when calculating initial phase
        drm/i915: Clean up skl_program_scaler()
        drm/i915: Move programming plane scaler to its own function.
        drm/i915/icl: Drop spurious register read from icl_dbuf_slices_update
        drm/i915: fix broadwell EU computation
        drm/amdgpu: fix huge page handling on Vega10
        drm/amd/pp: Fix truncated clock value when set watermark
        drm/amdgpu: fix bug with IH ring setup
        drm/meson: venc: dmt mode must use encp
        drm/amdgpu: set system aperture to cover whole FB region
        drm/i915: Fix hpd handling for pins with two encoders
        drm/i915/execlists: Force write serialisation into context image vs execution
        drm/i915/icl: Fix power well 2 wrt. DC-off toggling order
        drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
        drm/i915: Fix possible race in intel_dp_add_mst_connector()
        drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
        drm/omap: dsi: Fix missing of_platform_depopulate()
        drm/omap: Move DISPC runtime PM handling to omapdrm
        drm/omap: dsi: Ensure the device is active during probe
        drm/omap: hdmi4: Ensure the device is active during bind
        ...
      4efd3460