1. 23 Mar, 2019 40 commits
    • Zev Weiss's avatar
      kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv · 93c8a44a
      Zev Weiss authored
      commit 8cf7630b upstream.
      
      This bug has apparently existed since the introduction of this function
      in the pre-git era (4500e917 in Thomas Gleixner's history.git,
      "[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of
      neighbour sysctls.").
      
      As a minimal fix we can simply duplicate the corresponding check in
      do_proc_dointvec_conv().
      
      Link: http://lkml.kernel.org/r/20190207123426.9202-3-zev@bewilderbeest.netSigned-off-by: default avatarZev Weiss <zev@bewilderbeest.net>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: <stable@vger.kernel.org>	[2.6.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93c8a44a
    • Jan Stancek's avatar
      mm/memory.c: do_fault: avoid usage of stale vm_area_struct · 09417dd3
      Jan Stancek authored
      commit fc8efd2d upstream.
      
      LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
      This is a stress test, where one thread mmaps/writes/munmaps memory area
      and other thread is trying to read from it:
      
        CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
        Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
        Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
        Call Trace:
        ([<0000000000000000>]           (null))
         [<00000000001adae4>] lock_acquire+0xec/0x258
         [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
         [<000000000012a780>] page_table_free+0x48/0x1a8
         [<00000000002f6e54>] do_fault+0xdc/0x670
         [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0
         [<00000000002fb138>] handle_mm_fault+0x1b0/0x320
         [<00000000001248cc>] do_dat_exception+0x19c/0x2c8
         [<000000000080e5ee>] pgm_check_handler+0x19e/0x200
      
      page_table_free() is called with NULL mm parameter, but because "0" is a
      valid address on s390 (see S390_lowcore), it keeps going until it
      eventually crashes in lockdep's lock_acquire.  This crash is
      reproducible at least since 4.14.
      
      Problem is that "vmf->vma" used in do_fault() can become stale.  Because
      mmap_sem may be released, other threads can come in, call munmap() and
      cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
      re-used:
      
      handle_mm_fault                           |
        __handle_mm_fault                       |
          do_fault                              |
            vma = vmf->vma                      |
            do_read_fault                       |
              __do_fault                        |
                vma->vm_ops->fault(vmf);        |
                  mmap_sem is released          |
                                                |
                                                | do_munmap()
                                                |   remove_vma_list()
                                                |     remove_vma()
                                                |       vm_area_free()
                                                |         # vma is released
                                                | ...
                                                | # same vma is allocated
                                                | # from kmem cache
                                                | do_mmap()
                                                |   vm_area_alloc()
                                                |     memset(vma, 0, ...)
                                                |
            pte_free(vma->vm_mm, ...);          |
              page_table_free                   |
                spin_lock_bh(&mm->context.lock);|
                  <crash>                       |
      
      Cache mm_struct to avoid using potentially stale "vma".
      
      [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c
      
      Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@redhat.comSigned-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09417dd3
    • Roman Penyaev's avatar
      mm/vmalloc: fix size check for remap_vmalloc_range_partial() · c1ddc7b7
      Roman Penyaev authored
      commit 401592d2 upstream.
      
      When VM_NO_GUARD is not set area->size includes adjacent guard page,
      thus for correct size checking get_vm_area_size() should be used, but
      not area->size.
      
      This fixes possible kernel oops when userspace tries to mmap an area on
      1 page bigger than was allocated by vmalloc_user() call: the size check
      inside remap_vmalloc_range_partial() accounts non-existing guard page
      also, so check successfully passes but vmalloc_to_page() returns NULL
      (guard page does not physically exist).
      
      The following code pattern example should trigger an oops:
      
        static int oops_mmap(struct file *file, struct vm_area_struct *vma)
        {
              void *mem;
      
              mem = vmalloc_user(4096);
              BUG_ON(!mem);
              /* Do not care about mem leak */
      
              return remap_vmalloc_range(vma, mem, 0);
        }
      
      And userspace simply mmaps size + PAGE_SIZE:
      
        mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);
      
      Possible candidates for oops which do not have any explicit size
      checks:
      
         *** drivers/media/usb/stkwebcam/stk-webcam.c:
         v4l_stk_mmap[789]   ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
      
      Or the following one:
      
         *** drivers/video/fbdev/core/fbmem.c
         static int
         fb_mmap(struct file *file, struct vm_area_struct * vma)
              ...
              res = fb->fb_mmap(info, vma);
      
      Where fb_mmap callback calls remap_vmalloc_range() directly without any
      explicit checks:
      
         *** drivers/video/fbdev/vfb.c
         static int vfb_mmap(struct fb_info *info,
                   struct vm_area_struct *vma)
         {
             return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff);
         }
      
      Link: http://lkml.kernel.org/r/20190103145954.16942-2-rpenyaev@suse.deSigned-off-by: default avatarRoman Penyaev <rpenyaev@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1ddc7b7
    • zhongjiang's avatar
      mm: hwpoison: fix thp split handing in soft_offline_in_use_page() · 234c0cc9
      zhongjiang authored
      commit 46612b75 upstream.
      
      When soft_offline_in_use_page() runs on a thp tail page after pmd is
      split, we trigger the following VM_BUG_ON_PAGE():
      
        Memory failure: 0x3755ff: non anonymous thp
        __get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000
        Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
        page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1
        flags: 0x2fffff80000000()
        raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
        raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
        page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
        ------------[ cut here ]------------
        kernel BUG at ./include/linux/mm.h:519!
      
      soft_offline_in_use_page() passed refcount and page lock from tail page
      to head page, which is not needed because we can pass any subpage to
      split_huge_page().
      
      Naoya had fixed a similar issue in c3901e72 ("mm: hwpoison: fix thp
      split handling in memory_failure()").  But he missed fixing soft
      offline.
      
      Link: http://lkml.kernel.org/r/1551452476-24000-1-git-send-email-zhongjiang@huawei.com
      Fixes: 61f5d698 ("mm: re-enable THP")
      Signed-off-by: default avatarzhongjiang <zhongjiang@huawei.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      234c0cc9
    • Phuong Nguyen's avatar
      dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit · c7fd1a96
      Phuong Nguyen authored
      commit d9140a0d upstream.
      
      This commit fixes the issue that USB-DMAC hangs silently after system
      resumes on R-Car Gen3 hence renesas_usbhs will not work correctly
      when using USB-DMAC for bulk transfer e.g. ethernet or serial
      gadgets.
      
      The issue can be reproduced by these steps:
       1. modprobe g_serial
       2. Suspend and resume system.
       3. connect a usb cable to host side
       4. Transfer data from Host to Target
       5. cat /dev/ttyGS0 (Target side)
       6. echo "test" > /dev/ttyACM0 (Host side)
      
      The 'cat' will not result anything. However, system still can work
      normally.
      
      Currently, USB-DMAC driver does not have system sleep callbacks hence
      this driver relies on the PM core to force runtime suspend/resume to
      suspend and reinitialize USB-DMAC during system resume. After
      the commit 17218e00 ("PM / genpd: Stop/start devices without
      pm_runtime_force_suspend/resume()"), PM core will not force
      runtime suspend/resume anymore so this issue happens.
      
      To solve this, make system suspend resume explicit by using
      pm_runtime_force_{suspend,resume}() as the system sleep callbacks.
      SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is used to make sure USB-DMAC
      suspended after and initialized before renesas_usbhs."
      Signed-off-by: default avatarPhuong Nguyen <phuong.nguyen.xw@renesas.com>
      Signed-off-by: default avatarHiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
      Cc: <stable@vger.kernel.org> # v4.16+
      [shimoda: revise the commit log and add Cc tag]
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7fd1a96
    • Nikolaus Voss's avatar
      usb: typec: tps6598x: handle block writes separately with plain-I2C adapters · 822e2185
      Nikolaus Voss authored
      commit 8a863a60 upstream.
      
      Commit 1a2f474d handles block _reads_ separately with plain-I2C
      adapters, but the problem described with regmap-i2c not handling
      SMBus block transfers (i.e. read and writes) correctly also exists
      with writes.
      
      As workaround, this patch adds a block write function the same way
      1a2f474d adds a block read function.
      
      Fixes: 1a2f474d ("usb: typec: tps6598x: handle block reads separately with plain-I2C adapters")
      Fixes: 0a4c005b ("usb: typec: driver for TI TPS6598x USB Power Delivery controllers")
      Signed-off-by: default avatarNikolaus Voss <nikolaus.voss@loewensteinmedical.de>
      Cc: stable <stable@vger.kernel.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      822e2185
    • Dmitry Osipenko's avatar
      usb: chipidea: tegra: Fix missed ci_hdrc_remove_device() · 8415e718
      Dmitry Osipenko authored
      commit 563b9372 upstream.
      
      The ChipIdea's platform device need to be unregistered on Tegra's driver
      module removal.
      
      Fixes: dfebb5f4 ("usb: chipidea: Add support for Tegra20/30/114/124")
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-by: default avatarPeter Chen <peter.chen@nxp.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8415e718
    • Paul Cercueil's avatar
      clk: ingenic: Fix doc of ingenic_cgu_div_info · b1c1ef7b
      Paul Cercueil authored
      commit 7ca4c922 upstream.
      
      The 'div' field does not represent a number of bits used to divide
      (understand: right-shift) the divider, but a number itself used to
      divide the divider.
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Signed-off-by: default avatarMaarten ter Huurne <maarten@treewalker.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1c1ef7b
    • Paul Cercueil's avatar
      clk: ingenic: Fix round_rate misbehaving with non-integer dividers · 4a04611f
      Paul Cercueil authored
      commit bc5d922c upstream.
      
      Take a parent rate of 180 MHz, and a requested rate of 4.285715 MHz.
      This results in a theorical divider of 41.999993 which is then rounded
      up to 42. The .round_rate function would then return (180 MHz / 42) as
      the clock, rounded down, so 4.285714 MHz.
      
      Calling clk_set_rate on 4.285714 MHz would round the rate again, and
      give a theorical divider of 42,0000028, now rounded up to 43, and the
      rate returned would be (180 MHz / 43) which is 4.186046 MHz, aka. not
      what we requested.
      
      Fix this by rounding up the divisions.
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Tested-by: default avatarMaarten ter Huurne <maarten@treewalker.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a04611f
    • Krzysztof Kozlowski's avatar
      clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override · 33e7604a
      Krzysztof Kozlowski authored
      commit 785c9f41 upstream.
      
      Platform driver driver_override field should not be initialized from
      const memory because the core later kfree() it.  If driver_override is
      manually set later through sysfs, kfree() of old value leads to:
      
          $ echo "new_value" > /sys/bus/platform/drivers/.../driver_override
      
          kernel BUG at ../mm/slub.c:3960!
          Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
          ...
          (kfree) from [<c058e8c0>] (platform_set_driver_override+0x84/0xac)
          (platform_set_driver_override) from [<c058e908>] (driver_override_store+0x20/0x34)
          (driver_override_store) from [<c031f778>] (kernfs_fop_write+0x100/0x1dc)
          (kernfs_fop_write) from [<c0296de8>] (__vfs_write+0x2c/0x17c)
          (__vfs_write) from [<c02970c4>] (vfs_write+0xa4/0x188)
          (vfs_write) from [<c02972e8>] (ksys_write+0x4c/0xac)
          (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
      
      The clk-exynos5-subcmu driver uses override only for the purpose of
      creating meaningful names for children devices (matching names of power
      domains, e.g. DISP, MFC).  The driver_override was not developed for
      this purpose so just switch to default names of devices to fix the
      issue.
      
      Fixes: b06a532b ("clk: samsung: Add Exynos5 sub-CMU clock driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33e7604a
    • Krzysztof Kozlowski's avatar
      clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure · 4d1de1e6
      Krzysztof Kozlowski authored
      commit 5f0b6216 upstream.
      
      During initialization of subdevices if platform_device_alloc() failed,
      returned NULL pointer will be later dereferenced.  Add proper error
      paths to exynos5_clk_register_subcmu().  The return value of this
      function is still ignored because at this stage of init there is nothing
      we can do.
      
      Fixes: b06a532b ("clk: samsung: Add Exynos5 sub-CMU clock driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d1de1e6
    • Tony Lindgren's avatar
      clk: clk-twl6040: Fix imprecise external abort for pdmclk · 9aba7a8f
      Tony Lindgren authored
      commit 5ae51d67 upstream.
      
      I noticed that modprobe clk-twl6040 can fail after a cold boot with:
      abe_cm:clk:0010:0: failed to enable
      ...
      Unhandled fault: imprecise external abort (0x1406) at 0xbe896b20
      
      WARNING: CPU: 1 PID: 29 at drivers/clk/clk.c:828 clk_core_disable_lock+0x18/0x24
      ...
      (clk_core_disable_lock) from [<c0123534>] (_disable_clocks+0x18/0x90)
      (_disable_clocks) from [<c0124040>] (_idle+0x17c/0x244)
      (_idle) from [<c0125ad4>] (omap_hwmod_idle+0x24/0x44)
      (omap_hwmod_idle) from [<c053a038>] (sysc_runtime_suspend+0x48/0x108)
      (sysc_runtime_suspend) from [<c06084c4>] (__rpm_callback+0x144/0x1d8)
      (__rpm_callback) from [<c0608578>] (rpm_callback+0x20/0x80)
      (rpm_callback) from [<c0607034>] (rpm_suspend+0x120/0x694)
      (rpm_suspend) from [<c0607a78>] (__pm_runtime_idle+0x60/0x84)
      (__pm_runtime_idle) from [<c053aaf0>] (sysc_probe+0x874/0xf2c)
      (sysc_probe) from [<c05fecd4>] (platform_drv_probe+0x48/0x98)
      
      After searching around for a similar issue, I came across an earlier fix
      that never got merged upstream in the Android tree for glass-omap-xrr02.
      There is patch "MFD: twl6040-codec: Implement PDMCLK cold temp errata"
      by Misael Lopez Cruz <misael.lopez@ti.com>.
      
      Based on my observations, this fix is also needed when cold booting
      devices, and not just for deeper idle modes. Since we now have a clock
      driver for pdmclk, let's fix the issue in twl6040_pdmclk_prepare().
      
      Cc: Misael Lopez Cruz <misael.lopez@ti.com>
      Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9aba7a8f
    • Kunihiko Hayashi's avatar
      clk: uniphier: Fix update register for CPU-gear · 6e02a5f5
      Kunihiko Hayashi authored
      commit 52128223 upstream.
      
      Need to set the update bit in UNIPHIER_CLK_CPUGEAR_UPD to update
      the CPU-gear value.
      
      Fixes: d08f1f0d ("clk: uniphier: add CPU-gear change (cpufreq) support")
      Cc: linux-stable@vger.kernel.org
      Signed-off-by: default avatarKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Acked-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e02a5f5
    • Jan Kara's avatar
      ext2: Fix underflow in ext2_max_size() · 62600af3
      Jan Kara authored
      commit 1c2d1421 upstream.
      
      When ext2 filesystem is created with 64k block size, ext2_max_size()
      will return value less than 0. Also, we cannot write any file in this fs
      since the sb->maxbytes is less than 0. The core of the problem is that
      the size of block index tree for such large block size is more than
      i_blocks can carry. So fix the computation to count with this
      possibility.
      
      File size limits computed with the new function for the full range of
      possible block sizes look like:
      
      bits file_size
      10     17247252480
      11    275415851008
      12   2196873666560
      13   2197948973056
      14   2198486220800
      15   2198754754560
      16   2198888906752
      
      CC: stable@vger.kernel.org
      Reported-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62600af3
    • Vaibhav Jain's avatar
      cxl: Wrap iterations over afu slices inside 'afu_list_lock' · c335b493
      Vaibhav Jain authored
      commit edeb304f upstream.
      
      Within cxl module, iteration over array 'adapter->afu' may be racy
      at few points as it might be simultaneously read during an EEH and its
      contents being set to NULL while driver is being unloaded or unbound
      from the adapter. This might result in a NULL pointer to 'struct afu'
      being de-referenced during an EEH thereby causing a kernel oops.
      
      This patch fixes this by making sure that all access to the array
      'adapter->afu' is wrapped within the context of spin-lock
      'adapter->afu_list_lock'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Cc: stable@vger.kernel.org # v4.3+
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Acked-by: default avatarChristophe Lombard <clombard@linux.vnet.ibm.com>
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c335b493
    • Michael J. Ruhl's avatar
      IB/hfi1: Close race condition on user context disable and close · 54674984
      Michael J. Ruhl authored
      commit bc5add09 upstream.
      
      When disabling and removing a receive context, it is possible for an
      asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
      between cleaning up the context, and the context being used by the
      asynchronous event.
      
      cpu 0  (context cleanup)
          rc->ref_count-- (ref_count == 0)
          hfi1_rcd_free()
      cpu 1  (IRQ (with rcd index))
      	rcd_get_by_index()
      	lock
      	ref_count+++     <-- reference count race (WARNING)
      	return rcd
      	unlock
      cpu 0
          hfi1_free_ctxtdata() <-- incorrect free location
          lock
          remove rcd from array
          unlock
          free rcd
      
      This race will cause the following WARNING trace:
      
      WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
      CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      Call Trace:
        dump_stack+0x19/0x1b
        __warn+0xd8/0x100
        warn_slowpath_null+0x1d/0x20
        hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
        is_rcv_urgent_int+0x24/0x90 [hfi1]
        general_interrupt+0x1b6/0x210 [hfi1]
        __handle_irq_event_percpu+0x44/0x1c0
        handle_irq_event_percpu+0x32/0x80
        handle_irq_event+0x3c/0x60
        handle_edge_irq+0x7f/0x150
        handle_irq+0xe4/0x1a0
        do_IRQ+0x4d/0xf0
        common_interrupt+0x162/0x162
      
      The race can also lead to a use after free which could be similar to:
      
      general protection fault: 0000 1 SMP
      CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
      Call Trace:
        ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_aio_write+0xba/0x110 [hfi1]
        do_sync_readv_writev+0x7b/0xd0
        do_readv_writev+0xce/0x260
        ? handle_mm_fault+0x39d/0x9b0
        ? pick_next_task_fair+0x5f/0x1b0
        ? sched_clock_cpu+0x85/0xc0
        ? __schedule+0x13a/0x890
        vfs_writev+0x35/0x60
        SyS_writev+0x7f/0x110
        system_call_fastpath+0x22/0x27
      
      Use the appropriate kref API to verify access.
      
      Reorder context cleanup to ensure context removal before cleanup occurs
      correctly.
      
      Cc: stable@vger.kernel.org # v4.14.0+
      Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54674984
    • Lucas Stach's avatar
      PCI: dwc: skip MSI init if MSIs have been explicitly disabled · 09bc2f5a
      Lucas Stach authored
      commit 3afc8299 upstream.
      
      Since 7c5925af (PCI: dwc: Move MSI IRQs allocation to IRQ domains
      hierarchical API) the MSI init claims one of the controller IRQs as a
      chained IRQ line for the MSI controller. On some designs, like the i.MX6,
      this line is shared with a PCIe legacy IRQ. When the line is claimed for
      the MSI domain, any device trying to use this legacy IRQs will fail to
      request this IRQ line.
      
      As MSI and legacy IRQs are already mutually exclusive on the DWC core,
      as the core won't forward any legacy IRQs once any MSI has been enabled,
      users wishing to use legacy IRQs already need to explictly disable MSI
      support (usually via the pci=nomsi kernel commandline option). To avoid
      any issues with MSI conflicting with legacy IRQs, just skip all of the
      DWC MSI initalization, including the IRQ line claim, when MSI is disabled.
      
      Fixes: 7c5925af ("PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API")
      Tested-by: default avatarTim Harvey <tharvey@gateworks.com>
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09bc2f5a
    • Dongdong Liu's avatar
      PCI/DPC: Fix print AER status in DPC event handling · 13a9d14f
      Dongdong Liu authored
      commit 9f08a5d8 upstream.
      
      Previously dpc_handler() called aer_get_device_error_info() without
      initializing info->severity, so aer_get_device_error_info() relied on
      uninitialized data.
      
      Add dpc_get_aer_uncorrect_severity() to read the port's AER status, mask,
      and severity registers and set info->severity.
      
      Also, clear the port's AER fatal error status bits.
      
      Fixes: 8aefa9b0 ("PCI/DPC: Print AER status in DPC event handling")
      Signed-off-by: default avatarDongdong Liu <liudongdong3@huawei.com>
      [bhelgaas: changelog]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org	# v4.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13a9d14f
    • Bjorn Helgaas's avatar
      PCI/ASPM: Use LTR if already enabled by platform · c733cf4a
      Bjorn Helgaas authored
      commit 10ecc818 upstream.
      
      RussianNeuroMancer reported that the Intel 7265 wifi on a Dell Venue 11 Pro
      7140 table stopped working after wakeup from suspend and bisected the
      problem to 9ab105de ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't
      have LTR").  David Ward reported the same problem on a Dell Latitude 7350.
      
      After af8bb9f8 ("PCI/ACPI: Request LTR control from platform before
      using it"), we don't enable LTR unless the platform has granted LTR control
      to us.  In addition, we don't notice if the platform had already enabled
      LTR itself.
      
      After 9ab105de ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have
      LTR"), we avoid using LTR if we don't think the path to the device has LTR
      enabled.
      
      The combination means that if the platform itself enables LTR but declines
      to give the OS control over LTR, we unnecessarily avoided using ASPM L1.2.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469
      Fixes: 9ab105de ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR")
      Fixes: af8bb9f8 ("PCI/ACPI: Request LTR control from platform before using it")
      Reported-by: default avatarRussianNeuroMancer <russianneuromancer@ya.ru>
      Reported-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v4.18+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c733cf4a
    • Jan Kara's avatar
      ext4: fix crash during online resizing · 8a4fdc64
      Jan Kara authored
      commit f96c3ac8 upstream.
      
      When computing maximum size of filesystem possible with given number of
      group descriptor blocks, we forget to include s_first_data_block into
      the number of blocks. Thus for filesystems with non-zero
      s_first_data_block it can happen that computed maximum filesystem size
      is actually lower than current filesystem size which confuses the code
      and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
      flex_gd->count == 0. The problem can be reproduced like:
      
      truncate -s 100g /tmp/image
      mkfs.ext4 -b 1024 -E resize=262144 /tmp/image 32768
      mount -t ext4 -o loop /tmp/image /mnt
      resize2fs /dev/loop0 262145
      resize2fs /dev/loop0 300000
      
      Fix the problem by properly including s_first_data_block into the
      computed number of filesystem blocks.
      
      Fixes: 1c6bd717 "ext4: convert file system to meta_bg if needed..."
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a4fdc64
    • yangerkun's avatar
      ext4: add mask of ext4 flags to swap · a0d876c7
      yangerkun authored
      commit abdc644e upstream.
      
      The reason is that while swapping two inode, we swap the flags too.
      Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
      since we're not resetting the address operations structure.  The
      simplest way to keep things sane is to restrict the flags that can be
      swapped.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0d876c7
    • yangerkun's avatar
      ext4: update quota information while swapping boot loader inode · 048bfb5b
      yangerkun authored
      commit aa507b5f upstream.
      
      While do swap between two inode, they swap i_data without update
      quota information. Also, swap_inode_boot_loader can do "revert"
      somtimes, so update the quota while all operations has been finished.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      048bfb5b
    • yangerkun's avatar
      ext4: cleanup pagecache before swap i_data · 071f6816
      yangerkun authored
      commit a46c68a3 upstream.
      
      While do swap, we should make sure there has no new dirty page since we
      should swap i_data between two inode:
      1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
      read/write;
      2.Change filemap_flush to filemap_write_and_wait and move them to the
      space protected by inode lock to avoid new pagecache from buffer read/write.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      071f6816
    • yangerkun's avatar
      ext4: fix check of inode in swap_inode_boot_loader · cdf9941b
      yangerkun authored
      commit 67a11611 upstream.
      
      Before really do swap between inode and boot inode, something need to
      check to avoid invalid or not permitted operation, like does this inode
      has inline data. But the condition check should be protected by inode
      lock to avoid change while swapping. Also some other condition will not
      change between swapping, but there has no problem to do this under inode
      lock.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdf9941b
    • Arnd Bergmann's avatar
      cpufreq: pxa2xx: remove incorrect __init annotation · ae228aca
      Arnd Bergmann authored
      commit 9505b98c upstream.
      
      pxa_cpufreq_init_voltages() is marked __init but usually inlined into
      the non-__init pxa_cpufreq_init() function. When building with clang,
      it can stay as a standalone function in a discarded section, and produce
      this warning:
      
      WARNING: vmlinux.o(.text+0x616a00): Section mismatch in reference from the function pxa_cpufreq_init() to the function .init.text:pxa_cpufreq_init_voltages()
      The function pxa_cpufreq_init() references
      the function __init pxa_cpufreq_init_voltages().
      This is often because pxa_cpufreq_init lacks a __init
      annotation or the annotation of pxa_cpufreq_init_voltages is wrong.
      
      Fixes: 50e77fcd ("ARM: pxa: remove __init from cpufreq_driver->init()")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae228aca
    • Yangtao Li's avatar
      cpufreq: tegra124: add missing of_node_put() · f65b34d0
      Yangtao Li authored
      commit 446fae2b upstream.
      
      of_cpu_device_node_get() will increase the refcount of device_node,
      it is necessary to call of_node_put() at the end to release the
      refcount.
      
      Fixes: 9eb15dbb ("cpufreq: Add cpufreq driver for Tegra124")
      Cc: <stable@vger.kernel.org> # 4.4+
      Signed-off-by: default avatarYangtao Li <tiny.windzz@gmail.com>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f65b34d0
    • Viresh Kumar's avatar
      cpufreq: kryo: Release OPP tables on module removal · 33565a76
      Viresh Kumar authored
      commit 0334906c upstream.
      
      Commit 5ad7346b ("cpufreq: kryo: Add module remove and exit") made
      it possible to build the kryo cpufreq driver as a module, but it failed
      to release all the resources, i.e. OPP tables, when the module is
      unloaded.
      
      This patch fixes it by releasing the OPP tables, by calling
      dev_pm_opp_put_supported_hw() for them, from the
      qcom_cpufreq_kryo_remove() routine. The array of pointers to the OPP
      tables is also allocated dynamically now in qcom_cpufreq_kryo_probe(),
      as the pointers will be required while releasing the resources.
      
      Compile tested only.
      
      Cc: 4.18+ <stable@vger.kernel.org> # v4.18+
      Fixes: 5ad7346b ("cpufreq: kryo: Add module remove and exit")
      Reviewed-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33565a76
    • Masami Hiramatsu's avatar
      x86/kprobes: Prohibit probing on optprobe template code · ee7d297f
      Masami Hiramatsu authored
      commit 0192e653 upstream.
      
      Prohibit probing on optprobe template code, since it is not
      a code but a template instruction sequence. If we modify
      this template, copied template must be broken.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 9326638c ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation")
      Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devboxSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee7d297f
    • Doug Berger's avatar
      irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code · a477075e
      Doug Berger authored
      commit 33517881 upstream.
      
      Using the irq_gc_lock/irq_gc_unlock functions in the suspend and
      resume functions creates the opportunity for a deadlock during
      suspend, resume, and shutdown. Using the irq_gc_lock_irqsave/
      irq_gc_unlock_irqrestore variants prevents this possible deadlock.
      
      Cc: stable@vger.kernel.org
      Fixes: 7f646e92 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      [maz: tidied up $SUBJECT]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a477075e
    • Zenghui Yu's avatar
      irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table · c8666ede
      Zenghui Yu authored
      commit 8d565748 upstream.
      
      In current logic, its_parse_indirect_baser() will be invoked twice
      when allocating Device tables. Add a *break* to omit the unnecessary
      and annoying (might be ...) invoking.
      
      Fixes: 32bd44dc ("irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8666ede
    • Lubomir Rintel's avatar
      libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer · b92fad69
      Lubomir Rintel authored
      commit 607076a9 upstream.
      
      It doesn't make sense and the USB core warns on each submit of such
      URB, easily flooding the message buffer with tracebacks.
      
      Analogous issue was fixed in regular libertas driver in commit 6528d880
      ("libertas: don't set URB_ZERO_PACKET on IN USB transfer").
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Reviewed-by: default avatarSteve deRosier <derosier@cal-sierra.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b92fad69
    • Stephen Boyd's avatar
      soc: qcom: rpmh: Avoid accessing freed memory from batch API · 02c55be5
      Stephen Boyd authored
      commit baef1c90 upstream.
      
      Using the batch API from the interconnect driver sometimes leads to a
      KASAN error due to an access to freed memory. This is easier to trigger
      with threadirqs on the kernel commandline.
      
       BUG: KASAN: use-after-free in rpmh_tx_done+0x114/0x12c
       Read of size 1 at addr fffffff51414ad84 by task irq/110-apps_rs/57
      
       CPU: 0 PID: 57 Comm: irq/110-apps_rs Tainted: G        W         4.19.10 #72
       Call trace:
        dump_backtrace+0x0/0x2f8
        show_stack+0x20/0x2c
        __dump_stack+0x20/0x28
        dump_stack+0xcc/0x10c
        print_address_description+0x74/0x240
        kasan_report+0x250/0x26c
        __asan_report_load1_noabort+0x20/0x2c
        rpmh_tx_done+0x114/0x12c
        tcs_tx_done+0x450/0x768
        irq_forced_thread_fn+0x58/0x9c
        irq_thread+0x120/0x1dc
        kthread+0x248/0x260
        ret_from_fork+0x10/0x18
      
       Allocated by task 385:
        kasan_kmalloc+0xac/0x148
        __kmalloc+0x170/0x1e4
        rpmh_write_batch+0x174/0x540
        qcom_icc_set+0x8dc/0x9ac
        icc_set+0x288/0x2e8
        a6xx_gmu_stop+0x320/0x3c0
        a6xx_pm_suspend+0x108/0x124
        adreno_suspend+0x50/0x60
        pm_generic_runtime_suspend+0x60/0x78
        __rpm_callback+0x214/0x32c
        rpm_callback+0x54/0x184
        rpm_suspend+0x3f8/0xa90
        pm_runtime_work+0xb4/0x178
        process_one_work+0x544/0xbc0
        worker_thread+0x514/0x7d0
        kthread+0x248/0x260
        ret_from_fork+0x10/0x18
      
       Freed by task 385:
        __kasan_slab_free+0x12c/0x1e0
        kasan_slab_free+0x10/0x1c
        kfree+0x134/0x588
        rpmh_write_batch+0x49c/0x540
        qcom_icc_set+0x8dc/0x9ac
        icc_set+0x288/0x2e8
        a6xx_gmu_stop+0x320/0x3c0
        a6xx_pm_suspend+0x108/0x124
        adreno_suspend+0x50/0x60
       cr50_spi spi5.0: SPI transfer timed out
        pm_generic_runtime_suspend+0x60/0x78
        __rpm_callback+0x214/0x32c
        rpm_callback+0x54/0x184
        rpm_suspend+0x3f8/0xa90
        pm_runtime_work+0xb4/0x178
        process_one_work+0x544/0xbc0
        worker_thread+0x514/0x7d0
        kthread+0x248/0x260
        ret_from_fork+0x10/0x18
      
       The buggy address belongs to the object at fffffff51414ac80
        which belongs to the cache kmalloc-512 of size 512
       The buggy address is located 260 bytes inside of
        512-byte region [fffffff51414ac80, fffffff51414ae80)
       The buggy address belongs to the page:
       page:ffffffbfd4505200 count:1 mapcount:0 mapping:fffffff51e00c680 index:0x0 compound_mapcount: 0
       flags: 0x4000000000008100(slab|head)
       raw: 4000000000008100 ffffffbfd4529008 ffffffbfd44f9208 fffffff51e00c680
       raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        fffffff51414ac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        fffffff51414ad00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       >fffffff51414ad80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                          ^
        fffffff51414ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        fffffff51414ae80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      The batch API sets the same completion for each rpmh message that's sent
      and then loops through all the messages and waits for that single
      completion declared on the stack to be completed before returning from
      the function and freeing the message structures. Unfortunately, some
      messages may still be in process and 'stuck' in the TCS. At some later
      point, the tcs_tx_done() interrupt will run and try to process messages
      that have already been freed at the end of rpmh_write_batch(). This will
      in turn access the 'needs_free' member of the rpmh_request structure and
      cause KASAN to complain. Furthermore, if there's a message that's
      completed in rpmh_tx_done() and freed immediately after the complete()
      call is made we'll be racing with potentially freed memory when
      accessing the 'needs_free' member:
      
      	CPU0                         CPU1
      	----                         ----
      	rpmh_tx_done()
      	 complete(&compl)
      	                             wait_for_completion(&compl)
      	                             kfree(rpm_msg)
      	 if (rpm_msg->needs_free)
      	 <KASAN warning splat>
      
      Let's fix this by allocating a chunk of completions for each message and
      waiting for all of them to be completed before returning from the batch
      API. Alternatively, we could wait for the last message in the batch, but
      that may be a more complicated change because it looks like
      tcs_tx_done() just iterates through the indices of the queue and
      completes each message instead of tracking the last inserted message and
      completing that first.
      
      Fixes: c8790cb6 ("drivers: qcom: rpmh: add support for batch RPMH request")
      Cc: Lina Iyer <ilina@codeaurora.org>
      Cc: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Evan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarLina Iyer <ilina@codeaurora.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02c55be5
    • Filipe Manana's avatar
      Btrfs: fix corruption reading shared and compressed extents after hole punching · 898488e2
      Filipe Manana authored
      commit 8e928218 upstream.
      
      In the past we had data corruption when reading compressed extents that
      are shared within the same file and they are consecutive, this got fixed
      by commit 005efedf ("Btrfs: fix read corruption of compressed and
      shared extents") and by commit 808f80b4 ("Btrfs: update fix for read
      corruption of compressed and shared extents"). However there was a case
      that was missing in those fixes, which is when the shared and compressed
      extents are referenced with a non-zero offset. The following shell script
      creates a reproducer for this issue:
      
        #!/bin/bash
      
        mkfs.btrfs -f /dev/sdc &> /dev/null
        mount -o compress /dev/sdc /mnt/sdc
      
        # Create a file with 3 consecutive compressed extents, each has an
        # uncompressed size of 128Kb and a compressed size of 4Kb.
        for ((i = 1; i <= 3; i++)); do
            head -c 4096 /dev/zero
            for ((j = 1; j <= 31; j++)); do
                head -c 4096 /dev/zero | tr '\0' "\377"
            done
        done > /mnt/sdc/foobar
        sync
      
        echo "Digest after file creation:   $(md5sum /mnt/sdc/foobar)"
      
        # Clone the first extent into offsets 128K and 256K.
        xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
        xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
        sync
      
        echo "Digest after cloning:         $(md5sum /mnt/sdc/foobar)"
      
        # Punch holes into the regions that are already full of zeroes.
        xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
        xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
        xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
        sync
      
        echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"
      
        echo "Dropping page cache..."
        sysctl -q vm.drop_caches=1
        echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"
      
        umount /dev/sdc
      
      When running the script we get the following output:
      
        Digest after file creation:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
        linked 131072/131072 bytes at offset 131072
        128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
        linked 131072/131072 bytes at offset 262144
        128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
        Digest after cloning:         5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
        Digest after hole punching:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
        Dropping page cache...
        Digest after hole punching:   fba694ae8664ed0c2e9ff8937e7f1484  /mnt/sdc/foobar
      
      This happens because after reading all the pages of the extent in the
      range from 128K to 256K for example, we read the hole at offset 256K
      and then when reading the page at offset 260K we don't submit the
      existing bio, which is responsible for filling all the page in the
      range 128K to 256K only, therefore adding the pages from range 260K
      to 384K to the existing bio and submitting it after iterating over the
      entire range. Once the bio completes, the uncompressed data fills only
      the pages in the range 128K to 256K because there's no more data read
      from disk, leaving the pages in the range 260K to 384K unfilled. It is
      just a slightly different variant of what was solved by commit
      005efedf ("Btrfs: fix read corruption of compressed and shared
      extents").
      
      Fix this by forcing a bio submit, during readpages(), whenever we find a
      compressed extent map for a page that is different from the extent map
      for the previous page or has a different starting offset (in case it's
      the same compressed extent), instead of the extent map's original start
      offset.
      
      A test case for fstests follows soon.
      Reported-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Fixes: 808f80b4 ("Btrfs: update fix for read corruption of compressed and shared extents")
      Fixes: 005efedf ("Btrfs: fix read corruption of compressed and shared extents")
      Cc: stable@vger.kernel.org # 4.3+
      Tested-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      898488e2
    • Johannes Thumshirn's avatar
      btrfs: ensure that a DUP or RAID1 block group has exactly two stripes · 1a00f7fd
      Johannes Thumshirn authored
      commit 349ae63f upstream.
      
      We recently had a customer issue with a corrupted filesystem. When
      trying to mount this image btrfs panicked with a division by zero in
      calc_stripe_length().
      
      The corrupt chunk had a 'num_stripes' value of 1. calc_stripe_length()
      takes this value and divides it by the number of copies the RAID profile
      is expected to have to calculate the amount of data stripes. As a DUP
      profile is expected to have 2 copies this division resulted in 1/2 = 0.
      Later then the 'data_stripes' variable is used as a divisor in the
      stripe length calculation which results in a division by 0 and thus a
      kernel panic.
      
      When encountering a filesystem with a DUP block group and a
      'num_stripes' value unequal to 2, refuse mounting as the image is
      corrupted and will lead to unexpected behaviour.
      
      Code inspection showed a RAID1 block group has the same issues.
      
      Fixes: e06cd3dd ("Btrfs: add validadtion checks for chunk loading")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a00f7fd
    • Filipe Manana's avatar
      Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl · 6e24f5a1
      Filipe Manana authored
      commit a0873490 upstream.
      
      We are holding a transaction handle when setting an acl, therefore we can
      not allocate the xattr value buffer using GFP_KERNEL, as we could deadlock
      if reclaim is triggered by the allocation, therefore setup a nofs context.
      
      Fixes: 39a27ec1 ("btrfs: use GFP_KERNEL for xattr and acl allocations")
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e24f5a1
    • Filipe Manana's avatar
      Btrfs: setup a nofs context for memory allocation at btrfs_create_tree() · 61f92096
      Filipe Manana authored
      commit b89f6d1f upstream.
      
      We are holding a transaction handle when creating a tree, therefore we can
      not allocate the root using GFP_KERNEL, as we could deadlock if reclaim is
      triggered by the allocation, therefore setup a nofs context.
      
      Fixes: 74e4d827 ("btrfs: let callers of btrfs_alloc_root pass gfp flags")
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61f92096
    • Finn Thain's avatar
      m68k: Add -ffreestanding to CFLAGS · fcbf12e2
      Finn Thain authored
      commit 28713169 upstream.
      
      This patch fixes a build failure when using GCC 8.1:
      
      /usr/bin/ld: block/partitions/ldm.o: in function `ldm_parse_tocblock':
      block/partitions/ldm.c:153: undefined reference to `strcmp'
      
      This is caused by a new optimization which effectively replaces a
      strncmp() call with a strcmp() call. This affects a number of strncmp()
      call sites in the kernel.
      
      The entire class of optimizations is avoided with -fno-builtin, which
      gets enabled by -ffreestanding. This may avoid possible future build
      failures in case new optimizations appear in future compilers.
      
      I haven't done any performance measurements with this patch but I did
      count the function calls in a defconfig build. For example, there are now
      23 more sprintf() calls and 39 fewer strcpy() calls. The effect on the
      other libc functions is smaller.
      
      If this harms performance we can tackle that regression by optimizing
      the call sites, ideally using semantic patches. That way, clang and ICC
      builds might benfit too.
      
      Cc: stable@vger.kernel.org
      Reference: https://marc.info/?l=linux-m68k&m=154514816222244&w=2Signed-off-by: default avatarFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcbf12e2
    • Vivek Goyal's avatar
      ovl: Do not lose security.capability xattr over metadata file copy-up · 205f149f
      Vivek Goyal authored
      commit 993a0b2a upstream.
      
      If a file has been copied up metadata only, and later data is copied up,
      upper loses any security.capability xattr it has (underlying filesystem
      clears it as upon file write).
      
      From a user's point of view, this is just a file copy-up and that should
      not result in losing security.capability xattr.  Hence, before data copy
      up, save security.capability xattr (if any) and restore it on upper after
      data copy up is complete.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Fixes: 0c288874 ("ovl: A new xattr OVL_XATTR_METACOPY for file on upper")
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      205f149f
    • Vivek Goyal's avatar
      ovl: During copy up, first copy up data and then xattrs · 6f048ae2
      Vivek Goyal authored
      commit 5f32879e upstream.
      
      If a file with capability set (and hence security.capability xattr) is
      written kernel clears security.capability xattr. For overlay, during file
      copy up if xattrs are copied up first and then data is, copied up. This
      means data copy up will result in clearing of security.capability xattr
      file on lower has. And this can result into surprises. If a lower file has
      CAP_SETUID, then it should not be cleared over copy up (if nothing was
      actually written to file).
      
      This also creates problems with chown logic where it first copies up file
      and then tries to clear setuid bit. But by that time security.capability
      xattr is already gone (due to data copy up), and caller gets -ENODATA.
      This has been reported by Giuseppe here.
      
      https://github.com/containers/libpod/issues/2015#issuecomment-447824842
      
      Fix this by copying up data first and then metadta. This is a regression
      which has been introduced by my commit as part of metadata only copy up
      patches.
      
      TODO: There will be some corner cases where a file is copied up metadata
      only and later data copy up happens and that will clear security.capability
      xattr. Something needs to be done about that too.
      
      Fixes: bd64e575 ("ovl: During copy up, first copy up metadata and then data")
      Cc: <stable@vger.kernel.org> # v4.19+
      Reported-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f048ae2
    • Jann Horn's avatar
      splice: don't merge into linked buffers · 2af926fd
      Jann Horn authored
      commit a0ce2f0a upstream.
      
      Before this patch, it was possible for two pipes to affect each other after
      data had been transferred between them with tee():
      
      ============
      $ cat tee_test.c
      
      int main(void) {
        int pipe_a[2];
        if (pipe(pipe_a)) err(1, "pipe");
        int pipe_b[2];
        if (pipe(pipe_b)) err(1, "pipe");
        if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write");
        if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee");
        if (write(pipe_b[1], "xx", 2) != 2) err(1, "write");
      
        char buf[5];
        if (read(pipe_a[0], buf, 4) != 4) err(1, "read");
        buf[4] = 0;
        printf("got back: '%s'\n", buf);
      }
      $ gcc -o tee_test tee_test.c
      $ ./tee_test
      got back: 'abxx'
      $
      ============
      
      As suggested by Al Viro, fix it by creating a separate type for
      non-mergeable pipe buffers, then changing the types of buffers in
      splice_pipe_to_pipe() and link_pipe().
      
      Cc: <stable@vger.kernel.org>
      Fixes: 7c77f0b3 ("splice: implement pipe to pipe splicing")
      Fixes: 70524490 ("[PATCH] splice: add support for sys_tee()")
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2af926fd