1. 05 Dec, 2018 40 commits
    • Ben Wolsieffer's avatar
      staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION · ca0908dd
      Ben Wolsieffer authored
      commit 5a96b2d3 upstream.
      
      The compatibility ioctl wrapper for VCHIQ_IOC_AWAIT_COMPLETION assumes that
      the native ioctl always uses a message buffer and decrements msgbufcount.
      Certain message types do not use a message buffer and in this case
      msgbufcount is not decremented, and completion->header for the message is
      NULL. Because the wrapper unconditionally decrements msgbufcount, the
      calling process may assume that a message buffer has been used even when
      it has not.
      
      This results in a memory leak in the userspace code that interfaces with
      this driver. When msgbufcount is decremented, the userspace code assumes
      that the buffer can be freed though the reference in completion->header,
      which cannot happen when the reference is NULL.
      
      This patch causes the wrapper to only decrement msgbufcount when the
      native ioctl decrements it. Note that we cannot simply copy the native
      ioctl's value of msgbufcount, because the wrapper only retrieves messages
      from the native ioctl one at a time, while userspace may request multiple
      messages.
      
      See https://github.com/raspberrypi/linux/pull/2703 for more discussion of
      this patch.
      
      Fixes: 5569a126 ("staging: vchiq_arm: Add compatibility wrappers for ioctls")
      Signed-off-by: default avatarBen Wolsieffer <benwolsieffer@gmail.com>
      Acked-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca0908dd
    • Josef Bacik's avatar
      btrfs: release metadata before running delayed refs · 85df1f9f
      Josef Bacik authored
      We want to release the unused reservation we have since it refills the
      delayed refs reserve, which will make everything go smoother when
      running the delayed refs if we're short on our reservation.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85df1f9f
    • Richard Genoud's avatar
      dmaengine: at_hdmac: fix module unloading · 39c49a75
      Richard Genoud authored
      commit 77e75fda upstream.
      
      of_dma_controller_free() was not called on module onloading.
      This lead to a soft lockup:
      watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
      Modules linked in: at_hdmac [last unloaded: at_hdmac]
      when of_dma_request_slave_channel() tried to call ofdma->of_dma_xlate().
      
      Cc: stable@vger.kernel.org
      Fixes: bbe89c8e ("at_hdmac: move to generic DMA binding")
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39c49a75
    • Richard Genoud's avatar
      dmaengine: at_hdmac: fix memory leak in at_dma_xlate() · 7e572222
      Richard Genoud authored
      commit 98f5f932 upstream.
      
      The leak was found when opening/closing a serial port a great number of
      time, increasing kmalloc-32 in slabinfo.
      
      Each time the port was opened, dma_request_slave_channel() was called.
      Then, in at_dma_xlate(), atslave was allocated with devm_kzalloc() and
      never freed. (Well, it was free at module unload, but that's not what we
      want).
      So, here, kzalloc is more suited for the job since it has to be freed in
      atc_free_chan_resources().
      
      Cc: stable@vger.kernel.org
      Fixes: bbe89c8e ("at_hdmac: move to generic DMA binding")
      Reported-by: default avatarMario Forner <m.forner@be4energy.com>
      Suggested-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e572222
    • Heiko Stuebner's avatar
      ARM: dts: rockchip: Remove @0 from the veyron memory node · 2a9443a9
      Heiko Stuebner authored
      commit 672e60b7 upstream.
      
      The Coreboot version on veyron ChromeOS devices seems to ignore
      memory@0 nodes when updating the available memory and instead
      inserts another memory node without the address.
      
      This leads to 4GB systems only ever be using 2GB as the memory@0
      node takes precedence. So remove the @0 for veyron devices.
      
      Fixes: 0b639b81 ("ARM: dts: rockchip: Add missing unit name to memory nodes in rk3288 boards")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarHeikki Lindholm <holin@iki.fi>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a9443a9
    • Pan Bian's avatar
      ext2: fix potential use after free · 086d1f60
      Pan Bian authored
      commit ecebf55d upstream.
      
      The function ext2_xattr_set calls brelse(bh) to drop the reference count
      of bh. After that, bh may be freed. However, following brelse(bh),
      it reads bh->b_data via macro HDR(bh). This may result in a
      use-after-free bug. This patch moves brelse(bh) after reading field.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      086d1f60
    • Anisse Astier's avatar
      ALSA: hda/realtek - fix headset mic detection for MSI MS-B171 · e2ec0cb4
      Anisse Astier authored
      commit 8cd65271 upstream.
      
      MSI Cubi N 8GL (MS-B171) needs the same fixup as its older model, the
      MS-B120, in order for the headset mic to be properly detected.
      
      They both use a single 3-way jack for both mic and headset with an
      ALC283 codec, with the same pins used.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAnisse Astier <anisse@astier.eu>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2ec0cb4
    • Kailang Yang's avatar
      ALSA: hda/realtek - Support ALC300 · e50476e9
      Kailang Yang authored
      commit 1078bef0 upstream.
      
      This patch will enable ALC300.
      
      [ It's almost equivalent with other ALC269-compatible ones, and
        apparently has no loopback mixer -- tiwai ]
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e50476e9
    • Takashi Iwai's avatar
      ALSA: sparc: Fix invalid snd_free_pages() at error path · 46663071
      Takashi Iwai authored
      commit 9a20332a upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of sparc cs4231 driver code.  Since
      runtime->dma_area is managed by the PCM core helper, we shouldn't
      release manually.
      
      Drop the superfluous calls.
      Reviewed-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46663071
    • Takashi Iwai's avatar
      ALSA: control: Fix race between adding and removing a user element · 73ce314c
      Takashi Iwai authored
      commit e1a7bfe3 upstream.
      
      The procedure for adding a user control element has some window opened
      for race against the concurrent removal of a user element.  This was
      caught by syzkaller, hitting a KASAN use-after-free error.
      
      This patch addresses the bug by wrapping the whole procedure to add a
      user control element with the card->controls_rwsem, instead of only
      around the increment of card->user_ctl_count.
      
      This required a slight code refactoring, too.  The function
      snd_ctl_add() is split to two parts: a core function to add the
      control element and a part calling it.  The former is called from the
      function for adding a user control element inside the controls_rwsem.
      
      One change to be noted is that snd_ctl_notify() for adding a control
      element gets called inside the controls_rwsem as well while it was
      called outside the rwsem.  But this should be OK, as snd_ctl_notify()
      takes another (finer) rwlock instead of rwsem, and the call of
      snd_ctl_notify() inside rwsem is already done in another code path.
      
      Reported-by: syzbot+dc09047bce3820621ba2@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73ce314c
    • Takashi Iwai's avatar
      ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write · ef8944bf
      Takashi Iwai authored
      commit 7194eda1 upstream.
      
      The function snd_ac97_put_spsa() gets the bit shift value from the
      associated private_value, but it extracts too much; the current code
      extracts 8 bit values in bits 8-15, but this is a combination of two
      nibbles (bits 8-11 and bits 12-15) for left and right shifts.
      Due to the incorrect bits extraction, the actual shift may go beyond
      the 32bit value, as spotted recently by UBSAN check:
       UBSAN: Undefined behaviour in sound/pci/ac97/ac97_codec.c:836:7
       shift exponent 68 is too large for 32-bit type 'int'
      
      This patch fixes the shift value extraction by masking the properly
      with 0x0f instead of 0xff.
      Reported-and-tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef8944bf
    • Takashi Iwai's avatar
      ALSA: wss: Fix invalid snd_free_pages() at error path · 0d542d58
      Takashi Iwai authored
      commit 7b691541 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of wss driver code.  Since runtime->dma_area
      is managed by the PCM core helper, we shouldn't release manually.
      
      Drop the superfluous calls.
      Reviewed-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d542d58
    • Maximilian Heyne's avatar
      fs: fix lost error code in dio_complete · b7c769eb
      Maximilian Heyne authored
      commit 41e817bc upstream.
      
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: default avatarRavi Nankani <rnankani@amazon.com>
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7c769eb
    • Jiri Olsa's avatar
      perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() · ecef7c1a
      Jiri Olsa authored
      commit 67266c10 upstream.
      
      Currently we check the branch tracing only by checking for the
      PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE
      type. But we can define the same event with the PERF_TYPE_RAW
      type.
      
      Changing the intel_pmu_has_bts() code to check on event's final
      hw config value, so both HW types are covered.
      
      Adding unlikely to intel_pmu_has_bts() condition calls, because
      it was used in the original code in intel_bts_constraints.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecef7c1a
    • Jiri Olsa's avatar
      perf/x86/intel: Move branch tracing setup to the Intel-specific source file · fae1bec5
      Jiri Olsa authored
      commit ed6101bb upstream.
      
      Moving branch tracing setup to Intel core object into separate
      intel_pmu_bts_config function, because it's Intel specific.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fae1bec5
    • Sebastian Andrzej Siewior's avatar
      x86/fpu: Disable bottom halves while loading FPU registers · e8499ab5
      Sebastian Andrzej Siewior authored
      commit 68239654 upstream.
      
      The sequence
      
        fpu->initialized = 1;		/* step A */
        preempt_disable();		/* step B */
        fpu__restore(fpu);
        preempt_enable();
      
      in __fpu__restore_sig() is racy in regard to a context switch.
      
      For 32bit frames, __fpu__restore_sig() prepares the FPU state within
      fpu->state. To ensure that a context switch (switch_fpu_prepare() in
      particular) does not modify fpu->state it uses fpu__drop() which sets
      fpu->initialized to 0.
      
      After fpu->initialized is cleared, the CPU's FPU state is not saved
      to fpu->state during a context switch. The new state is loaded via
      fpu__restore(). It gets loaded into fpu->state from userland and
      ensured it is sane. fpu->initialized is then set to 1 in order to avoid
      fpu__initialize() doing anything (overwrite the new state) which is part
      of fpu__restore().
      
      A context switch between step A and B above would save CPU's current FPU
      registers to fpu->state and overwrite the newly prepared state. This
      looks like a tiny race window but the Kernel Test Robot reported this
      back in 2016 while we had lazy FPU support. Borislav Petkov made the
      link between that report and another patch that has been posted. Since
      the removal of the lazy FPU support, this race goes unnoticed because
      the warning has been removed.
      
      Disable bottom halves around the restore sequence to avoid the race. BH
      need to be disabled because BH is allowed to run (even with preemption
      disabled) and might invoke kernel_fpu_begin() by doing IPsec.
      
       [ bp: massage commit message a bit. ]
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de
      Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnicSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8499ab5
    • Borislav Petkov's avatar
      x86/MCE/AMD: Fix the thresholding machinery initialization order · 855eefd9
      Borislav Petkov authored
      commit 60c8144a upstream.
      
      Currently, the code sets up the thresholding interrupt vector and only
      then goes about initializing the thresholding banks. Which is wrong,
      because an early thresholding interrupt would cause a NULL pointer
      dereference when accessing those banks and prevent the machine from
      booting.
      
      Therefore, set the thresholding interrupt vector only *after* having
      initialized the banks successfully.
      
      Fixes: 18807ddb ("x86/mce/AMD: Reset Threshold Limit after logging error")
      Reported-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reported-by: default avatarJohn Clemens <clemej@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Tested-by: default avatarJohn Clemens <john@deater.net>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86@kernel.org
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      855eefd9
    • Christoph Muellner's avatar
      arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. · ad953dfd
      Christoph Muellner authored
      commit c1d91f86 upstream.
      
      This patch fixes the wrong polarity setting for the PCIe host driver's
      pre-reset pin for rk3399-puma-haikou. Without this patch link training
      will most likely fail.
      
      Fixes: 60fd9f72 ("arm64: dts: rockchip: add Haikou baseboard with RK3399-Q7 SoM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristoph Muellner <christoph.muellner@theobroma-systems.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad953dfd
    • Hou Zhiqiang's avatar
      PCI: layerscape: Fix wrong invocation of outbound window disable accessor · ee48a9df
      Hou Zhiqiang authored
      commit c6fd6fe9 upstream.
      
      The order of parameters is not correct when invoking the outbound
      window disable routine. Fix it.
      
      Fixes: 4a2745d7 ("PCI: layerscape: Disable outbound windows configured by bootloader")
      Signed-off-by: default avatarHou Zhiqiang <Zhiqiang.Hou@nxp.com>
      [lorenzo.pieralisi@arm.com: commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee48a9df
    • Pan Bian's avatar
      btrfs: relocation: set trans to be NULL after ending transaction · e380f318
      Pan Bian authored
      commit 42a657f5 upstream.
      
      The function relocate_block_group calls btrfs_end_transaction to release
      trans when update_backref_cache returns 1, and then continues the loop
      body. If btrfs_block_rsv_refill fails this time, it will jump out the
      loop and the freed trans will be accessed. This may result in a
      use-after-free bug. The patch assigns NULL to trans after trans is
      released so that it will not be accessed.
      
      Fixes: 0647bf56 ("Btrfs: improve forever loop when doing balance relocation")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e380f318
    • Filipe Manana's avatar
      Btrfs: ensure path name is null terminated at btrfs_control_ioctl · 52fa8eaa
      Filipe Manana authored
      commit f505754f upstream.
      
      We were using the path name received from user space without checking that
      it is null terminated. While btrfs-progs is well behaved and does proper
      validation and null termination, someone could call the ioctl and pass
      a non-null terminated patch, leading to buffer overrun problems in the
      kernel.  The ioctl is protected by CAP_SYS_ADMIN.
      
      So just set the last byte of the path to a null character, similar to what
      we do in other ioctls (add/remove/resize device, snapshot creation, etc).
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52fa8eaa
    • Max Filippov's avatar
      xtensa: fix coprocessor part of ptrace_{get,set}xregs · d36e4ee3
      Max Filippov authored
      commit 38a35a78 upstream.
      
      Layout of coprocessor registers in the elf_xtregs_t and
      xtregs_coprocessor_t may be different due to alignment. Thus it is not
      always possible to copy data between the xtregs_coprocessor_t structure
      and the elf_xtregs_t and get correct values for all registers.
      Use a table of offsets and sizes of individual coprocessor register
      groups to do coprocessor context copying in the ptrace_getxregs and
      ptrace_setxregs.
      This fixes incorrect coprocessor register values reading from the user
      process by the native gdb on an xtensa core with multiple coprocessors
      and registers with high alignment requirements.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d36e4ee3
    • Max Filippov's avatar
      xtensa: fix coprocessor context offset definitions · b3518222
      Max Filippov authored
      commit 03bc996a upstream.
      
      Coprocessor context offsets are used by the assembly code that moves
      coprocessor context between the individual fields of the
      thread_info::xtregs_cp structure and coprocessor registers.
      This fixes coprocessor context clobbering on flushing and reloading
      during normal user code execution and user process debugging in the
      presence of more than one coprocessor in the core configuration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3518222
    • Max Filippov's avatar
      xtensa: enable coprocessors that are being flushed · d039837a
      Max Filippov authored
      commit 2958b666 upstream.
      
      coprocessor_flush_all may be called from a context of a thread that is
      different from the thread being flushed. In that case contents of the
      cpenable special register may not match ti->cpenable of the target
      thread, resulting in unhandled coprocessor exception in the kernel
      context.
      Set cpenable special register to the ti->cpenable of the target register
      for the duration of the flush and restore it afterwards.
      This fixes the following crash caused by coprocessor register inspection
      in native gdb:
      
        (gdb) p/x $w0
        Illegal instruction in kernel: sig: 9 [#1] PREEMPT
        Call Trace:
          ___might_sleep+0x184/0x1a4
          __might_sleep+0x41/0xac
          exit_signals+0x14/0x218
          do_exit+0xc9/0x8b8
          die+0x99/0xa0
          do_illegal_instruction+0x18/0x6c
          common_exception+0x77/0x77
          coprocessor_flush+0x16/0x3c
          arch_ptrace+0x46c/0x674
          sys_ptrace+0x2ce/0x3b4
          system_call+0x54/0x80
          common_exception+0x77/0x77
        note: gdb[100] exited with preempt_count 1
        Killed
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d039837a
    • Wanpeng Li's avatar
      KVM: X86: Fix scan ioapic use-before-initialization · 83f00ab9
      Wanpeng Li authored
      commit e97f852f upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83f00ab9
    • Liran Alon's avatar
      KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall · 08b9a967
      Liran Alon authored
      commit bcbfbd8e upstream.
      
      kvm_pv_clock_pairing() allocates local var
      "struct kvm_clock_pairing clock_pairing" on stack and initializes
      all it's fields besides padding (clock_pairing.pad[]).
      
      Because clock_pairing var is written completely (including padding)
      to guest memory, failure to init struct padding results in kernel
      info-leak.
      
      Fix the issue by making sure to also init the padding with zeroes.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08b9a967
    • Jim Mattson's avatar
      kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb · 57e972ec
      Jim Mattson authored
      commit fd65d314 upstream.
      
      Previously, we only called indirect_branch_prediction_barrier on the
      logical CPU that freed a vmcb. This function should be called on all
      logical CPUs that last loaded the vmcb in question.
      
      Fixes: 15d45071 ("KVM/x86: Add IBPB support")
      Reported-by: default avatarNeel Natu <neelnatu@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57e972ec
    • Junaid Shahid's avatar
      kvm: mmu: Fix race in emulated page table writes · a0636152
      Junaid Shahid authored
      commit 0e0fee5c upstream.
      
      When a guest page table is updated via an emulated write,
      kvm_mmu_pte_write() is called to update the shadow PTE using the just
      written guest PTE value. But if two emulated guest PTE writes happened
      concurrently, it is possible that the guest PTE and the shadow PTE end
      up being out of sync. Emulated writes do not mark the shadow page as
      unsync-ed, so this inconsistency will not be resolved even by a guest TLB
      flush (unless the page was marked as unsync-ed at some other point).
      
      This is fixed by re-reading the current value of the guest PTE after the
      MMU lock has been acquired instead of just using the value that was
      written prior to calling kvm_mmu_pte_write().
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0636152
    • Thomas Gleixner's avatar
      x86/speculation: Provide IBPB always command line options · 78085d7e
      Thomas Gleixner authored
      commit 55a97402 upstream
      
      Provide the possibility to enable IBPB always in combination with 'prctl'
      and 'seccomp'.
      
      Add the extra command line options and rework the IBPB selection to
      evaluate the command instead of the mode selected by the STIPB switch case.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78085d7e
    • Thomas Gleixner's avatar
      x86/speculation: Add seccomp Spectre v2 user space protection mode · ca97dd00
      Thomas Gleixner authored
      commit 6b3e64c2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca97dd00
    • Thomas Gleixner's avatar
      x86/speculation: Enable prctl mode for spectre_v2_user · 605b2828
      Thomas Gleixner authored
      commit 7cc765a6 upstream
      
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      605b2828
    • Thomas Gleixner's avatar
      x86/speculation: Add prctl() control for indirect branch speculation · 6a847a60
      Thomas Gleixner authored
      commit 9137bb27 upstream
      
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a847a60
    • Thomas Gleixner's avatar
      x86/speculation: Prepare arch_smt_update() for PRCTL mode · 99f1cb80
      Thomas Gleixner authored
      commit 6893a959 upstream
      
      The upcoming fine grained per task STIBP control needs to be updated on CPU
      hotplug as well.
      
      Split out the code which controls the strict mode so the prctl control code
      can be added later. Mark the SMP function call argument __unused while at it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99f1cb80
    • Thomas Gleixner's avatar
      x86/speculation: Prevent stale SPEC_CTRL msr content · e3f822b6
      Thomas Gleixner authored
      commit 6d991ba5 upstream
      
      The seccomp speculation control operates on all tasks of a process, but
      only the current task of a process can update the MSR immediately. For the
      other threads the update is deferred to the next context switch.
      
      This creates the following situation with Process A and B:
      
      Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2
      does not have the speculation control TIF bit set. Process B task 1 has the
      speculation control TIF bit set.
      
      CPU0					CPU1
      					MSR bit is set
      					ProcB.T1 schedules out
      					ProcA.T2 schedules in
      					MSR bit is cleared
      ProcA.T1
        seccomp_update()
        set TIF bit on ProcA.T2
      					ProcB.T1 schedules in
      					MSR is not updated  <-- FAIL
      
      This happens because the context switch code tries to avoid the MSR update
      if the speculation control TIF bits of the incoming and the outgoing task
      are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks
      scheduling back and forth on CPU1, which keeps the MSR stale forever.
      
      In theory this could be remedied by IPIs, but chasing the remote task which
      could be migrated is complex and full of races.
      
      The straight forward solution is to avoid the asychronous update of the TIF
      bit and defer it to the next context switch. The speculation control state
      is stored in task_struct::atomic_flags by the prctl and seccomp updates
      already.
      
      Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the
      atomic_flags. Check the bit on context switch and force a synchronous
      update of the speculation control if set. Use the same mechanism for
      updating the current task.
      Reported-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3f822b6
    • Thomas Gleixner's avatar
      x86/speculation: Split out TIF update · dcb4ac34
      Thomas Gleixner authored
      commit e6da8bb6 upstream
      
      The update of the TIF_SSBD flag and the conditional speculation control MSR
      update is done in the ssb_prctl_set() function directly. The upcoming prctl
      support for controlling indirect branch speculation via STIBP needs the
      same mechanism.
      
      Split the code out and make it reusable. Reword the comment about updates
      for other tasks.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcb4ac34
    • Thomas Gleixner's avatar
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · dae4d590
      Thomas Gleixner authored
      commit 46f7ecb1 upstream
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dae4d590
    • Thomas Gleixner's avatar
      x86/speculation: Prepare for conditional IBPB in switch_mm() · cbca99b9
      Thomas Gleixner authored
      commit 4c71a2b6 upstream
      
      The IBPB speculation barrier is issued from switch_mm() when the kernel
      switches to a user space task with a different mm than the user space task
      which ran last on the same CPU.
      
      An additional optimization is to avoid IBPB when the incoming task can be
      ptraced by the outgoing task. This optimization only works when switching
      directly between two user space tasks. When switching from a kernel task to
      a user space task the optimization fails because the previous task cannot
      be accessed anymore. So for quite some scenarios the optimization is just
      adding overhead.
      
      The upcoming conditional IBPB support will issue IBPB only for user space
      tasks which have the TIF_SPEC_IB bit set. This requires to handle the
      following cases:
      
        1) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB set to a user space task (potential victim) which has
           TIF_SPEC_IB not set.
      
        2) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB not set to a user space task (potential victim) which has
           TIF_SPEC_IB set.
      
      This needs to be optimized for the case where the IBPB can be avoided when
      only kernel threads ran in between user space tasks which belong to the
      same process.
      
      The current check whether two tasks belong to the same context is using the
      tasks context id. While correct, it's simpler to use the mm pointer because
      it allows to mangle the TIF_SPEC_IB bit into it. The context id based
      mechanism requires extra storage, which creates worse code.
      
      When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
      the per CPU storage which is used to track the last user space mm which was
      running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
      the incoming task to make the decision whether IBPB needs to be issued or
      not to cover the two cases above.
      
      As conditional IBPB is going to be the default, remove the dubious ptrace
      check for the IBPB always case and simply issue IBPB always when the
      process changes.
      
      Move the storage to a different place in the struct as the original one
      created a hole.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbca99b9
    • Thomas Gleixner's avatar
      x86/speculation: Avoid __switch_to_xtra() calls · ba523588
      Thomas Gleixner authored
      commit 5635d999 upstream
      
      The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke
      __switch_to_xtra() when:
      
       - CONFIG_SMP is disabled
      
       - The conditional STIPB mode is disabled
      
      The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask
      checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the
      only set bit in the work masks.
      
      Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at
      run time when the static key controlling the conditional STIBP mode is
      disabled.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba523588
    • Thomas Gleixner's avatar
      x86/process: Consolidate and simplify switch_to_xtra() code · 7fe6a4ba
      Thomas Gleixner authored
      commit ff16701a upstream
      
      Move the conditional invocation of __switch_to_xtra() into an inline
      function so the logic can be shared between 32 and 64 bit.
      
      Remove the handthrough of the TSS pointer and retrieve the pointer directly
      in the bitmap handling function. Use this_cpu_ptr() instead of the
      per_cpu() indirection.
      
      This is a preparatory change so integration of conditional indirect branch
      speculation optimization happens only in one place.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fe6a4ba
    • Tim Chen's avatar
      x86/speculation: Prepare for per task indirect branch speculation control · 1fe4e69a
      Tim Chen authored
      commit 5bfbe3ad upstream
      
      To avoid the overhead of STIBP always on, it's necessary to allow per task
      control of STIBP.
      
      Add a new task flag TIF_SPEC_IB and evaluate it during context switch if
      SMT is active and flag evaluation is enabled by the speculation control
      code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the
      guest/host switch works properly.
      
      This has no effect because TIF_SPEC_IB cannot be set yet and the static key
      which controls evaluation is off. Preparatory patch for adding the control
      code.
      
      [ tglx: Simplify the context switch logic and make the TIF evaluation
        	depend on SMP=y and on the static key controlling the conditional
        	update. Rename it to TIF_SPEC_IB because it controls both STIBP and
        	IBPB ]
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fe4e69a