1. 20 Apr, 2019 17 commits
  2. 17 Apr, 2019 23 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.35 · 4b0e041c
      Greg Kroah-Hartman authored
      4b0e041c
    • Marc Orr's avatar
      KVM: x86: nVMX: fix x2APIC VTPR read intercept · 59bf185a
      Marc Orr authored
      commit c73f4c99 upstream.
      
      Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
      SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
      virtualization" is 0, a RDMSR of 808H should return the VTPR from the
      virtual APIC page.
      
      However, for nested, KVM currently fails to disable the read intercept
      for this MSR. This means that a RDMSR exit takes precedence over
      "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
      instead of sourcing the value from L2's virtual APIC page.
      
      This patch fixes the issue by disabling the read intercept, in VMCS02,
      for the VTPR when "APIC-register virtualization" is 0.
      
      The issue described above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
      mode w/ nested".
      Signed-off-by: default avatarMarc Orr <marcorr@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Fixes: c992384b ("KVM: vmx: speed up MSR bitmap merge")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59bf185a
    • Marc Orr's avatar
      KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) · 119031be
      Marc Orr authored
      commit acff7847 upstream.
      
      The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
      x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
      result, we discovered the potential for a buggy or malicious L1 to get
      access to L0's x2APIC MSRs, via an L2, as follows.
      
      1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
      variable, in nested_vmx_prepare_msr_bitmap() to become true.
      2. L1 disables "virtualize x2APIC mode" in VMCS12.
      3. L1 enables "APIC-register virtualization" in VMCS12.
      
      Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
      set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
      
      This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
      intercepts with VMCS12's "virtualize x2APIC mode" control.
      
      The scenario outlined above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Add leak scenario to
      virt_x2apic_mode_test".
      
      Note, it looks like this issue may have been introduced inadvertently
      during a merge---see 15303ba5.
      Signed-off-by: default avatarMarc Orr <marcorr@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      119031be
    • Erik Schmauss's avatar
      ACPICA: AML interpreter: add region addresses in global list during initialization · f8053df6
      Erik Schmauss authored
      commit 4abb951b upstream.
      
      The table load process omitted adding the operation region address
      range to the global list. This omission is problematic because the OS
      queries the global list to check for address range conflicts before
      deciding which drivers to load. This commit may result in warning
      messages that look like the following:
      
      [    7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts with op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213)
      [    7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
      
      However, these messages do not signify regressions. It is a result of
      properly adding address ranges within the global address list.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200011Tested-by: default avatarJean-Marc Lenoir <archlinux@jihemel.com>
      Signed-off-by: default avatarErik Schmauss <erik.schmauss@intel.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8053df6
    • Tomohiro Mayama's avatar
      arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64 · fad502a9
      Tomohiro Mayama authored
      commit a8772e5d upstream.
      
      This patch makes USB ports functioning again.
      
      Fixes: 955bebde ("arm64: dts: rockchip: add rk3328-rock64 board")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarTomohiro Mayama <parly-gh@iris.mystia.org>
      Tested-by: default avatarKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fad502a9
    • Katsuhiro Suzuki's avatar
      arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64 · c9634759
      Katsuhiro Suzuki authored
      commit ef05bcb6 upstream.
      
      This patch fixes pin assign of vcc_host1_5v. This regulator is
      controlled by USB20_HOST_DRV signal.
      
      ROCK64 schematic says that GPIO0_A2 pin is used as USB20_HOST_DRV.
      GPIO0_D3 pin is for SPDIF_TX_M0.
      Signed-off-by: default avatarKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9634759
    • Mikulas Patocka's avatar
      dm integrity: fix deadlock with overlapping I/O · aa9ee4b1
      Mikulas Patocka authored
      commit 4ed319c6 upstream.
      
      dm-integrity will deadlock if overlapping I/O is issued to it, the bug
      was introduced by commit 724376a0 ("dm integrity: implement fair
      range locks").  Users rarely use overlapping I/O so this bug went
      undetected until now.
      
      Fix this bug by correcting, likely cut-n-paste, typos in
      ranges_overlap() and also remove a flawed ranges_overlap() check in
      remove_range_unlocked().  This condition could leave unprocessed bios
      hanging on wait_list forever.
      
      Cc: stable@vger.kernel.org # v4.19+
      Fixes: 724376a0 ("dm integrity: implement fair range locks")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa9ee4b1
    • Ilya Dryomov's avatar
      dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors · 469b40a4
      Ilya Dryomov authored
      commit eb40c0ac upstream.
      
      Some devices don't use blk_integrity but still want stable pages
      because they do their own checksumming.  Examples include rbd and iSCSI
      when data digests are negotiated.  Stacking DM (and thus LVM) on top of
      these devices results in sporadic checksum errors.
      
      Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      469b40a4
    • Mikulas Patocka's avatar
      dm: revert 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") · 4f5c99e0
      Mikulas Patocka authored
      commit 75ae1936 upstream.
      
      The limit was already incorporated to dm-crypt with commit 4e870e94
      ("dm crypt: fix error with too large bios"), so we don't need to apply
      it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
      wrong anyway because the variable ti->max_io_len it is supposed to be in
      the units of 512-byte sectors not in bytes.
      
      Reduction of the limit to 1048576 sectors could even cause data
      corruption in rare cases - suppose that we have a dm-striped device with
      stripe size 768MiB. The target will call dm_set_target_max_io_len with
      the value 1572864. The buggy code would reduce it to 1048576. Now, the
      dm-core will errorneously split the bios on 1048576-sector boundary
      insetad of 1572864-sector boundary and pass these stripe-crossing bios
      to the striped target.
      
      Cc: stable@vger.kernel.org # v4.16+
      Fixes: 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f5c99e0
    • Mikulas Patocka's avatar
      dm integrity: change memcmp to strncmp in dm_integrity_ctr · 30dc4d7b
      Mikulas Patocka authored
      commit 0d74e6a3 upstream.
      
      If the string opt_string is small, the function memcmp can access bytes
      that are beyond the terminating nul character. In theory, it could cause
      segfault, if opt_string were located just below some unmapped memory.
      
      Change from memcmp to strncmp so that we don't read bytes beyond the end
      of the string.
      
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30dc4d7b
    • Sergey Miroshnichenko's avatar
      PCI: pciehp: Ignore Link State Changes after powering off a slot · 5be6e02c
      Sergey Miroshnichenko authored
      commit 3943af9d upstream.
      
      During a safe hot remove, the OS powers off the slot, which may cause a
      Data Link Layer State Changed event.  The slot has already been set to
      OFF_STATE, so that event results in re-enabling the device, making it
      impossible to safely remove it.
      
      Clear out the Presence Detect Changed and Data Link Layer State Changed
      events when the disabled slot has settled down.
      
      It is still possible to re-enable the device if it remains in the slot
      after pressing the Attention Button by pressing it again.
      
      Fixes the problem that Micah reported below: an NVMe drive power button may
      not actually turn off the drive.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237Reported-by: default avatarMicah Parrish <micah.parrish@hpe.com>
      Tested-by: default avatarMicah Parrish <micah.parrish@hpe.com>
      Signed-off-by: default avatarSergey Miroshnichenko <s.miroshnichenko@yadro.com>
      [bhelgaas: changelog, add bugzilla URL]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org	# v4.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5be6e02c
    • Andre Przywara's avatar
      PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller · 250fef8d
      Andre Przywara authored
      commit 9cde402a upstream.
      
      There is a Marvell 88SE9170 PCIe SATA controller I found on a board here.
      Some quick testing with the ARM SMMU enabled reveals that it suffers from
      the same requester ID mixup problems as the other Marvell chips listed
      already.
      
      Add the PCI vendor/device ID to the list of chips which need the
      workaround.
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      250fef8d
    • Lendacky, Thomas's avatar
      x86/perf/amd: Remove need to check "running" bit in NMI handler · 05626465
      Lendacky, Thomas authored
      commit 3966c3fe upstream.
      
      Spurious interrupt support was added to perf in the following commit, almost
      a decade ago:
      
        63e6be6d ("perf, x86: Catch spurious interrupts after disabling counters")
      
      The two previous patches (resolving the race condition when disabling a
      PMC and NMI latency mitigation) allow for the removal of this older
      spurious interrupt support.
      
      Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
      is cleared before disabling the PMC, which sets up a race condition. This
      race condition was mitigated by introducing the running bitmap. That race
      condition can be eliminated by first disabling the PMC, waiting for PMC
      reset on overflow and then clearing the bit for the PMC in the active_mask
      bitmap. The NMI handler will not re-enable a disabled counter.
      
      If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
      mitigation support will guard against any unhandled NMI messages.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05626465
    • Lendacky, Thomas's avatar
      x86/perf/amd: Resolve NMI latency issues for active PMCs · 23d39b0a
      Lendacky, Thomas authored
      commit 6d3edaae upstream.
      
      On AMD processors, the detection of an overflowed PMC counter in the NMI
      handler relies on the current value of the PMC. So, for example, to check
      for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
      overflowed) or 0 (overflowed).
      
      When the perf NMI handler executes it does not know in advance which PMC
      counters have overflowed. As such, the NMI handler will process all active
      PMC counters that have overflowed. NMI latency in newer AMD processors can
      result in multiple overflowed PMC counters being processed in one NMI and
      then a subsequent NMI, that does not appear to be a back-to-back NMI, not
      finding any PMC counters that have overflowed. This may appear to be an
      unhandled NMI resulting in either a panic or a series of messages,
      depending on how the kernel was configured.
      
      To mitigate this issue, add an AMD handle_irq callback function,
      amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
      function and upon return perform some additional processing that will
      indicate if the NMI has been handled or would have been handled had an
      earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
      minimum value of the number of active PMCs or 2 will be set whenever a
      PMC is active. This is used to indicate the possible number of NMIs that
      can still occur. The value of 2 is used for when an NMI does not arrive
      at the LAPIC in time to be collapsed into an already pending NMI. Each
      time the function is called without having handled an overflowed counter,
      the per-CPU value is checked. If the value is non-zero, it is decremented
      and the NMI indicates that it handled the NMI. If the value is zero, then
      the NMI indicates that it did not handle the NMI.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23d39b0a
    • Lendacky, Thomas's avatar
      x86/perf/amd: Resolve race condition when disabling PMC · e5a791b4
      Lendacky, Thomas authored
      commit 914123fa upstream.
      
      On AMD processors, the detection of an overflowed counter in the NMI
      handler relies on the current value of the counter. So, for example, to
      check for overflow on a 48 bit counter, bit 47 is checked to see if it
      is 1 (not overflowed) or 0 (overflowed).
      
      There is currently a race condition present when disabling and then
      updating the PMC. Increased NMI latency in newer AMD processors makes this
      race condition more pronounced. If the counter value has overflowed, it is
      possible to update the PMC value before the NMI handler can run. The
      updated PMC value is not an overflowed value, so when the perf NMI handler
      does run, it will not find an overflowed counter. This may appear as an
      unknown NMI resulting in either a panic or a series of messages, depending
      on how the kernel is configured.
      
      To eliminate this race condition, the PMC value must be checked after
      disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
      will wait for the NMI handler to reset any active and overflowed counter
      after calling x86_pmu_disable_all().
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5a791b4
    • Alexander Potapenko's avatar
      x86/asm: Use stricter assembly constraints in bitops · 4b004504
      Alexander Potapenko authored
      commit 5b77e95d upstream.
      
      There's a number of problems with how arch/x86/include/asm/bitops.h
      is currently using assembly constraints for the memory region
      bitops are modifying:
      
      1) Use memory clobber in bitops that touch arbitrary memory
      
      Certain bit operations that read/write bits take a base pointer and an
      arbitrarily large offset to address the bit relative to that base.
      Inline assembly constraints aren't expressive enough to tell the
      compiler that the assembly directive is going to touch a specific memory
      location of unknown size, therefore we have to use the "memory" clobber
      to indicate that the assembly is going to access memory locations other
      than those listed in the inputs/outputs.
      
      To indicate that BTR/BTS instructions don't necessarily touch the first
      sizeof(long) bytes of the argument, we also move the address to assembly
      inputs.
      
      This particular change leads to size increase of 124 kernel functions in
      a defconfig build. For some of them the diff is in NOP operations, other
      end up re-reading values from memory and may potentially slow down the
      execution. But without these clobbers the compiler is free to cache
      the contents of the bitmaps and use them as if they weren't changed by
      the inline assembly.
      
      2) Use byte-sized arguments for operations touching single bytes.
      
      Passing a long value to ANDB/ORB/XORB instructions makes the compiler
      treat sizeof(long) bytes as being clobbered, which isn't the case. This
      may theoretically lead to worse code in the case of heavy optimization.
      
      Practical impact:
      
      I've built a defconfig kernel and looked through some of the functions
      generated by GCC 7.3.0 with and without this clobber, and didn't spot
      any miscompilations.
      
      However there is a (trivial) theoretical case where this code leads to
      miscompilation:
      
        https://lkml.org/lkml/2019/3/28/393
      
      using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
      such a function in the kernel someday.
      
      So the primary motivation is to fix an existing misuse of the asm
      directive, which happens to work in certain configurations now, but
      isn't guaranteed to work under different circumstances.
      
      [ --mingo: Added -stable tag because defconfig only builds a fraction
        of the kernel and the trivial testcase looks normal enough to
        be used in existing or in-development code. ]
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Y Knight <jyknight@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
      [ Edited the changelog, tidied up one of the defines. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b004504
    • Rasmus Villemoes's avatar
      x86/asm: Remove dead __GNUC__ conditionals · 356ae4de
      Rasmus Villemoes authored
      commit 88ca66d8 upstream.
      
      The minimum supported gcc version is >= 4.6, so these can be removed.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dkSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      356ae4de
    • Max Filippov's avatar
      xtensa: fix return_address · f7b778b9
      Max Filippov authored
      commit ada770b1 upstream.
      
      return_address returns the address that is one level higher in the call
      stack than requested in its argument, because level 0 corresponds to its
      caller's return address. Use requested level as the number of stack
      frames to skip.
      
      This fixes the address reported by might_sleep and friends.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7b778b9
    • Mel Gorman's avatar
      sched/fair: Do not re-read ->h_load_next during hierarchical load calculation · cb75a0c5
      Mel Gorman authored
      commit 0e9f0245 upstream.
      
      A NULL pointer dereference bug was reported on a distribution kernel but
      the same issue should be present on mainline kernel. It occured on s390
      but should not be arch-specific.  A partial oops looks like:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        ...
        Call Trace:
          ...
          try_to_wake_up+0xfc/0x450
          vhost_poll_wakeup+0x3a/0x50 [vhost]
          __wake_up_common+0xbc/0x178
          __wake_up_common_lock+0x9e/0x160
          __wake_up_sync_key+0x4e/0x60
          sock_def_readable+0x5e/0x98
      
      The bug hits any time between 1 hour to 3 days. The dereference occurs
      in update_cfs_rq_h_load when accumulating h_load. The problem is that
      cfq_rq->h_load_next is not protected by any locking and can be updated
      by parallel calls to task_h_load. Depending on the compiler, code may be
      generated that re-reads cfq_rq->h_load_next after the check for NULL and
      then oops when reading se->avg.load_avg. The dissassembly showed that it
      was possible to reread h_load_next after the check for NULL.
      
      While this does not appear to be an issue for later compilers, it's still
      an accident if the correct code is generated. Full locking in this path
      would have high overhead so this patch uses READ_ONCE to read h_load_next
      only once and check for NULL before dereferencing. It was confirmed that
      there were no further oops after 10 days of testing.
      
      As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any
      potential problems with store tearing.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Fixes: 68520796 ("sched: Move h_load calculation to task_h_load()")
      Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb75a0c5
    • Dan Carpenter's avatar
      xen: Prevent buffer overflow in privcmd ioctl · ed3adb56
      Dan Carpenter authored
      commit 42d8644b upstream.
      
      The "call" variable comes from the user in privcmd_ioctl_hypercall().
      It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
      elements.  We need to put an upper bound on it to prevent an out of
      bounds access.
      
      Cc: stable@vger.kernel.org
      Fixes: 1246ae0b ("xen: add variable hypercall caller")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed3adb56
    • Will Deacon's avatar
      arm64: backtrace: Don't bother trying to unwind the userspace stack · 84c6c2af
      Will Deacon authored
      commit 1e6f5440 upstream.
      
      Calling dump_backtrace() with a pt_regs argument corresponding to
      userspace doesn't make any sense and our unwinder will simply print
      "Call trace:" before unwinding the stack looking for user frames.
      
      Rather than go through this song and dance, just return early if we're
      passed a user register state.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 1149aad1 ("arm64: Add dump_backtrace() in show_regs")
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84c6c2af
    • Peter Geis's avatar
      arm64: dts: rockchip: fix rk3328 rgmii high tx error rate · 1ec54cee
      Peter Geis authored
      commit 6fd8b978 upstream.
      
      Several rk3328 based boards experience high rgmii tx error rates.
      This is due to several pins in the rk3328.dtsi rgmii pinmux that are
      missing a defined pull strength setting.
      This causes the pinmux driver to default to 2ma (bit mask 00).
      
      These pins are only defined in the rk3328.dtsi, and are not listed in
      the rk3328 specification.
      The TRM only lists them as "Reserved"
      (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX,
      GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX).
      However, removal of these pins from the rgmii pinmux definition causes
      the interface to fail to transmit.
      
      Also, the rgmii tx and rx pins defined in the dtsi are not consistent
      with the rk3328 specification, with tx pins currently set to 12ma and
      rx pins set to 2ma.
      
      Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent
      with the specification.
      Defining the drive strength for the undefined pins eliminated the high
      tx packet error rate observed under heavy data transfers.
      Aligning the drive strength to the TRM values eliminated the occasional
      packet retry errors under iperf3 testing.
      This allows much higher data rates with no recorded tx errors.
      
      Tested on the rk3328-roc-cc board.
      
      Fixes: 52e02d37 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeter Geis <pgwipeout@gmail.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ec54cee
    • Will Deacon's avatar
      arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value · 82a30a5d
      Will Deacon authored
      commit 045afc24 upstream.
      
      Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
      explicitly set the return value on the non-faulting path and instead
      leaves it holding the result of the underlying atomic operation. This
      means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
      value will be reported as having failed. Regrettably, I wrote the buggy
      code back in 2011 and it was upstreamed as part of the initial arm64
      support in 2012.
      
      The reasons we appear to get away with this are:
      
        1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
           exercised by futex() test applications
      
        2. If the result of the atomic operation is zero, the system call
           behaves correctly
      
        3. Prior to version 2.25, the only operation used by GLIBC set the
           futex to zero, and therefore worked as expected. From 2.25 onwards,
           FUTEX_WAKE_OP is not used by GLIBC at all.
      
      Fix the implementation by ensuring that the return value is either 0
      to indicate that the atomic operation completed successfully, or -EFAULT
      if we encountered a fault when accessing the user mapping.
      
      Cc: <stable@kernel.org>
      Fixes: 6170a974 ("arm64: Atomic operations")
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82a30a5d