1. 15 Apr, 2022 30 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 59250f8a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "14 patches.
      
        Subsystems affected by this patch series: MAINTAINERS, binfmt, and
        mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
        hugetlb, vmalloc, and kmemleak)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
        mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
        revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
        revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
        hugetlb: do not demote poisoned hugetlb pages
        mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
        mm: fix unexpected zeroed page mapping with zram swap
        mm, page_alloc: fix build_zonerefs_node()
        mm, kfence: support kmem_dump_obj() for KFENCE objects
        kasan: fix hw tags enablement when KUNIT tests are disabled
        irq_work: use kasan_record_aux_stack_noalloc() record callstack
        mm/secretmem: fix panic when growing a memfd_secret
        tmpfs: fix regressions from wider use of ZERO_PAGE
        MAINTAINERS: Broadcom internal lists aren't maintainers
      59250f8a
    • Linus Torvalds's avatar
      Merge tag 'for-5.18/dm-fixes-2' of... · ce673f63
      Linus Torvalds authored
      Merge tag 'for-5.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix memory corruption in DM integrity target when tag_size is less
         than digest size.
      
       - Fix DM multipath's historical-service-time path selector to not use
         sched_clock() and ktime_get_ns(); only use ktime_get_ns().
      
       - Fix dm_io->orig_bio NULL pointer dereference in dm_zone_map_bio() due
         to 5.18 changes that overlooked DM zone's use of ->orig_bio
      
       - Fix for regression that broke the use of dm_accept_partial_bio() for
         "abnormal" IO (e.g. WRITE ZEROES) that does not need duplicate bios
      
       - Fix DM's issuing of empty flush bio so that it's size is 0.
      
      * tag 'for-5.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: fix bio length of empty flush
        dm: allow dm_accept_partial_bio() for dm_io without duplicate bios
        dm zone: fix NULL pointer dereference in dm_zone_map_bio
        dm mpath: only use ktime_get_ns() in historical selector
        dm integrity: fix memory corruption when tag_size is less than digest size
      ce673f63
    • Patrick Wang's avatar
      mm: kmemleak: take a full lowmem check in kmemleak_*_phys() · 23c2d497
      Patrick Wang authored
      The kmemleak_*_phys() apis do not check the address for lowmem's min
      boundary, while the caller may pass an address below lowmem, which will
      trigger an oops:
      
        # echo scan > /sys/kernel/debug/kmemleak
        Unable to handle kernel paging request at virtual address ff5fffffffe00000
        Oops [#1]
        Modules linked in:
        CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33
        Hardware name: riscv-virtio,qemu (DT)
        epc : scan_block+0x74/0x15c
         ra : scan_block+0x72/0x15c
        epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30
         gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200
         t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90
         s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000
         a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001
         a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005
         s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90
         s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0
         s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000
         s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f
         t5 : 0000000000000001 t6 : 0000000000000000
        status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d
          scan_gray_list+0x12e/0x1a6
          kmemleak_scan+0x2aa/0x57e
          kmemleak_write+0x32a/0x40c
          full_proxy_write+0x56/0x82
          vfs_write+0xa6/0x2a6
          ksys_write+0x6c/0xe2
          sys_write+0x22/0x2a
          ret_from_syscall+0x0/0x2
      
      The callers may not quite know the actual address they pass(e.g. from
      devicetree).  So the kmemleak_*_phys() apis should guarantee the address
      they finally use is in lowmem range, so check the address for lowmem's
      min boundary.
      
      Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.comSigned-off-by: default avatarPatrick Wang <patrick.wang.shcn@gmail.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23c2d497
    • Omar Sandoval's avatar
      mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore · c12cd77c
      Omar Sandoval authored
      Commit 3ee48b6a ("mm, x86: Saving vmcore with non-lazy freeing of
      vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
      lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
      purge the vmap areas instead of doing it lazily.
      
      Commit 690467c8 ("mm/vmalloc: Move draining areas out of caller
      context") moved the purging from the vunmap() caller to a worker thread.
      Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
      (possibly forever).  For example, consider the following scenario:
      
       1. Thread reads from /proc/vmcore. This eventually calls
          __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
          vmap_lazy_nr to lazy_max_pages() + 1.
      
       2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
          pages (one page plus the guard page) to the purge list and
          vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
          drain_vmap_work is scheduled.
      
       3. Thread returns from the kernel and is scheduled out.
      
       4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
          frees the 2 pages on the purge list. vmap_lazy_nr is now
          lazy_max_pages() + 1.
      
       5. This is still over the threshold, so it tries to purge areas again,
          but doesn't find anything.
      
       6. Repeat 5.
      
      If the system is running with only one CPU (which is typicial for kdump)
      and preemption is disabled, then this will never make forward progress:
      there aren't any more pages to purge, so it hangs.  If there is more
      than one CPU or preemption is enabled, then the worker thread will spin
      forever in the background.  (Note that if there were already pages to be
      purged at the time that set_iounmap_nonlazy() was called, this bug is
      avoided.)
      
      This can be reproduced with anything that reads from /proc/vmcore
      multiple times.  E.g., vmcore-dmesg /proc/vmcore.
      
      It turns out that improvements to vmap() over the years have obsoleted
      the need for this "optimization".  I benchmarked `dd if=/proc/vmcore
      of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore.
      The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and
      5.18-rc1 with set_iounmap_nonlazy() removed entirely:
      
          |5.17  |5.18+fix|5.18+removal
        4k|40.86s|  40.09s|      26.73s
        1M|24.47s|  23.98s|      21.84s
      
      The removal was the fastest (by a wide margin with 4k reads).  This
      patch removes set_iounmap_nonlazy().
      
      Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com
      Fixes: 690467c8  ("mm/vmalloc: Move draining areas out of caller context")
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c12cd77c
    • Andrew Morton's avatar
      revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" · aeb79237
      Andrew Morton authored
      Despite Mike's attempted fix (925346c1), regressions reports
      continue:
      
        https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
        https://bugzilla.kernel.org/show_bug.cgi?id=215720
        https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info
      
      So revert this patch.
      
      Fixes: 9630f0d6 ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fangrui Song <maskray@google.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Sandeep Patil <sspatil@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aeb79237
    • Andrew Morton's avatar
      revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" · 354e923d
      Andrew Morton authored
      Commit 925346c1 ("fs/binfmt_elf: fix PT_LOAD p_align values for
      loaders") was an attempt to fix regressions due to 9630f0d6
      ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").
      
      But regressionss continue to be reported:
      
        https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
        https://bugzilla.kernel.org/show_bug.cgi?id=215720
        https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info
      
      This patch reverts the fix, so the original can also be reverted.
      
      Fixes: 925346c1 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Chris Kennelly <ckennelly@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Sandeep Patil <sspatil@google.com>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      354e923d
    • Mike Kravetz's avatar
      hugetlb: do not demote poisoned hugetlb pages · 5a317412
      Mike Kravetz authored
      It is possible for poisoned hugetlb pages to reside on the free lists.
      The huge page allocation routines which dequeue entries from the free
      lists make a point of avoiding poisoned pages.  There is no such check
      and avoidance in the demote code path.
      
      If a hugetlb page on the is on a free list, poison will only be set in
      the head page rather then the page with the actual error.  If such a
      page is demoted, then the poison flag may follow the wrong page.  A page
      without error could have poison set, and a page with poison could not
      have the flag set.
      
      Check for poison before attempting to demote a hugetlb page.  Also,
      return -EBUSY to the caller if only poisoned pages are on the free list.
      
      Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com
      Fixes: 8531fc6f ("hugetlb: add hugetlb demote page support")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a317412
    • Charan Teja Kalla's avatar
      mm: compaction: fix compiler warning when CONFIG_COMPACTION=n · 31ca72fa
      Charan Teja Kalla authored
      The below warning is reported when CONFIG_COMPACTION=n:
      
         mm/compaction.c:56:27: warning: 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' defined but not used [-Wunused-const-variable=]
            56 | static const unsigned int HPAGE_FRAG_CHECK_INTERVAL_MSEC = 500;
               |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix it by moving 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' under
      CONFIG_COMPACTION defconfig.
      
      Also since this is just a 'static const int' type, use #define for it.
      
      Link: https://lkml.kernel.org/r/1647608518-20924-1-git-send-email-quic_charante@quicinc.comSigned-off-by: default avatarCharan Teja Kalla <quic_charante@quicinc.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Nitin Gupta <nigupta@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31ca72fa
    • Minchan Kim's avatar
      mm: fix unexpected zeroed page mapping with zram swap · e914d8f0
      Minchan Kim authored
      Two processes under CLONE_VM cloning, user process can be corrupted by
      seeing zeroed page unexpectedly.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
        swap_readpage valid data
          swap_slot_free_notify
            delete zram entry
                                    swap_readpage zeroed(invalid) data
                                    pte_lock
                                    map the *zero data* to userspace
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return and next refault will
        read zeroed data
      
      The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
      increase the refcount of swap slot at copy_mm so it couldn't catch up
      whether it's safe or not to discard data from backing device.  In the
      case, only the lock it could rely on to synchronize swap slot freeing is
      page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
      function.  With this patch, CPU A will see correct data.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                                    swap_readpage original data
                                    pte_lock
                                    map the original data
                                    swap_free
                                      swap_range_free
                                        bd_disk->fops->swap_slot_free_notify
        swap_readpage read zeroed data
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return
        on next refault will see mapped data by CPU B
      
      The concern of the patch would increase memory consumption since it
      could keep wasted memory with compressed form in zram as well as
      uncompressed form in address space.  However, most of cases of zram uses
      no readahead and do_swap_page is followed by swap_free so it will free
      the compressed form from in zram quickly.
      
      Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
      Fixes: 0bcac06f ("mm, swap: skip swapcache for swapin of synchronous device")
      Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Tested-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e914d8f0
    • Juergen Gross's avatar
      mm, page_alloc: fix build_zonerefs_node() · e553f62f
      Juergen Gross authored
      Since commit 6aa303de ("mm, vmscan: only allocate and reclaim from
      zones with pages managed by the buddy allocator") only zones with free
      memory are included in a built zonelist.  This is problematic when e.g.
      all memory of a zone has been ballooned out when zonelists are being
      rebuilt.
      
      The decision whether to rebuild the zonelists when onlining new memory
      is done based on populated_zone() returning 0 for the zone the memory
      will be added to.  The new zone is added to the zonelists only, if it
      has free memory pages (managed_zone() returns a non-zero value) after
      the memory has been onlined.  This implies, that onlining memory will
      always free the added pages to the allocator immediately, but this is
      not true in all cases: when e.g. running as a Xen guest the onlined new
      memory will be added only to the ballooned memory list, it will be freed
      only when the guest is being ballooned up afterwards.
      
      Another problem with using managed_zone() for the decision whether a
      zone is being added to the zonelists is, that a zone with all memory
      used will in fact be removed from all zonelists in case the zonelists
      happen to be rebuilt.
      
      Use populated_zone() when building a zonelist as it has been done before
      that commit.
      
      There was a report that QubesOS (based on Xen) is hitting this problem.
      Xen has switched to use the zone device functionality in kernel 5.9 and
      QubesOS wants to use memory hotplugging for guests in order to be able
      to start a guest with minimal memory and expand it as needed.  This was
      the report leading to the patch.
      
      Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
      Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reported-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e553f62f
    • Marco Elver's avatar
      mm, kfence: support kmem_dump_obj() for KFENCE objects · 2dfe63e6
      Marco Elver authored
      Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
      producing garbage data due to the object not actually being maintained
      by SLAB or SLUB.
      
      Fix this by implementing __kfence_obj_info() that copies relevant
      information to struct kmem_obj_info when the object was allocated by
      KFENCE; this is called by a common kmem_obj_info(), which also calls the
      slab/slub/slob specific variant now called __kmem_obj_info().
      
      For completeness, kmem_dump_obj() now displays if the object was
      allocated by KFENCE.
      
      Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
      Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
      Fixes: b89fb5ef ("mm, kfence: insert KFENCE hooks for SLUB")
      Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Acked-by: Vlastimil Babka <vbabka@suse.cz>	[slab]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2dfe63e6
    • Vincenzo Frascino's avatar
      kasan: fix hw tags enablement when KUNIT tests are disabled · b1add418
      Vincenzo Frascino authored
      Kasan enables hw tags via kasan_enable_tagging() which based on the mode
      passed via kernel command line selects the correct hw backend.
      kasan_enable_tagging() is meant to be invoked indirectly via the cpu
      features framework of the architectures that support these backends.
      Currently the invocation of this function is guarded by
      CONFIG_KASAN_KUNIT_TEST which allows the enablement of the correct backend
      only when KUNIT tests are enabled in the kernel.
      
      This inconsistency was introduced in commit:
      
        ed6d7444 ("kasan: test: support async (again) and asymm modes for HW_TAGS")
      
      ... and prevents to enable MTE on arm64 when KUNIT tests for kasan hw_tags are
      disabled.
      
      Fix the issue making sure that the CONFIG_KASAN_KUNIT_TEST guard does not
      prevent the correct invocation of kasan_enable_tagging().
      
      Link: https://lkml.kernel.org/r/20220408124323.10028-1-vincenzo.frascino@arm.com
      Fixes: ed6d7444 ("kasan: test: support async (again) and asymm modes for HW_TAGS")
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1add418
    • Zqiang's avatar
      irq_work: use kasan_record_aux_stack_noalloc() record callstack · 25934fcf
      Zqiang authored
      On PREEMPT_RT kernel and KASAN is enabled.  the kasan_record_aux_stack()
      may call alloc_pages(), and the rt-spinlock will be acquired, if currently
      in atomic context, will trigger warning:
      
        BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
        in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 239, name: bootlogd
        Preemption disabled at:
        [<ffffffffbab1a531>] rt_mutex_slowunlock+0xa1/0x4e0
        CPU: 3 PID: 239 Comm: bootlogd Tainted: G        W 5.17.1-rt17-yocto-preempt-rt+ #105
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
        Call Trace:
           __might_resched.cold+0x13b/0x173
           rt_spin_lock+0x5b/0xf0
           get_page_from_freelist+0x20c/0x1610
           __alloc_pages+0x25e/0x5e0
           __stack_depot_save+0x3c0/0x4a0
           kasan_save_stack+0x3a/0x50
           __kasan_record_aux_stack+0xb6/0xc0
           kasan_record_aux_stack+0xe/0x10
           irq_work_queue_on+0x6a/0x1c0
           pull_rt_task+0x631/0x6b0
           do_balance_callbacks+0x56/0x80
           __balance_callbacks+0x63/0x90
           rt_mutex_setprio+0x349/0x880
           rt_mutex_slowunlock+0x22a/0x4e0
           rt_spin_unlock+0x49/0x80
           uart_write+0x186/0x2b0
           do_output_char+0x2e9/0x3a0
           n_tty_write+0x306/0x800
           file_tty_write.isra.0+0x2af/0x450
           tty_write+0x22/0x30
           new_sync_write+0x27c/0x3a0
           vfs_write+0x3f7/0x5d0
           ksys_write+0xd9/0x180
           __x64_sys_write+0x43/0x50
           do_syscall_64+0x44/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fix it by using kasan_record_aux_stack_noalloc() to avoid the call to
      alloc_pages().
      
      Link: https://lkml.kernel.org/r/20220402142555.2699582-1-qiang1.zhang@intel.comSigned-off-by: default avatarZqiang <qiang1.zhang@intel.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25934fcf
    • Axel Rasmussen's avatar
      mm/secretmem: fix panic when growing a memfd_secret · f9b141f9
      Axel Rasmussen authored
      When one tries to grow an existing memfd_secret with ftruncate, one gets
      a panic [1].  For example, doing the following reliably induces the
      panic:
      
          fd = memfd_secret();
      
          ftruncate(fd, 10);
          ptr = mmap(NULL, 10, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
          strcpy(ptr, "123456789");
      
          munmap(ptr, 10);
          ftruncate(fd, 20);
      
      The basic reason for this is, when we grow with ftruncate, we call down
      into simple_setattr, and then truncate_inode_pages_range, and eventually
      we try to zero part of the memory.  The normal truncation code does this
      via the direct map (i.e., it calls page_address() and hands that to
      memset()).
      
      For memfd_secret though, we specifically don't map our pages via the
      direct map (i.e.  we call set_direct_map_invalid_noflush() on every
      fault).  So the address returned by page_address() isn't useful, and
      when we try to memset() with it we panic.
      
      This patch avoids the panic by implementing a custom setattr for
      memfd_secret, which detects resizes specifically (setting the size for
      the first time works just fine, since there are no existing pages to try
      to zero), and rejects them with EINVAL.
      
      One could argue growing should be supported, but I think that will
      require a significantly more lengthy change.  So, I propose a minimal
      fix for the benefit of stable kernels, and then perhaps to extend
      memfd_secret to support growing in a separate patch.
      
      [1]:
      
        BUG: unable to handle page fault for address: ffffa0a889277028
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0002) - not-present page
        PGD afa01067 P4D afa01067 PUD 83f909067 PMD 83f8bf067 PTE 800ffffef6d88060
        Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 0 PID: 281 Comm: repro Not tainted 5.17.0-dbg-DEV #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        RIP: 0010:memset_erms+0x9/0x10
        Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
        RSP: 0018:ffffb932c09afbf0 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffffda63c4249dc0 RCX: 0000000000000fd8
        RDX: 0000000000000fd8 RSI: 0000000000000000 RDI: ffffa0a889277028
        RBP: ffffb932c09afc00 R08: 0000000000001000 R09: ffffa0a889277028
        R10: 0000000000020023 R11: 0000000000000000 R12: ffffda63c4249dc0
        R13: ffffa0a890d70d98 R14: 0000000000000028 R15: 0000000000000fd8
        FS:  00007f7294899580(0000) GS:ffffa0af9bc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffa0a889277028 CR3: 0000000107ef6006 CR4: 0000000000370ef0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? zero_user_segments+0x82/0x190
         truncate_inode_partial_folio+0xd4/0x2a0
         truncate_inode_pages_range+0x380/0x830
         truncate_setsize+0x63/0x80
         simple_setattr+0x37/0x60
         notify_change+0x3d8/0x4d0
         do_sys_ftruncate+0x162/0x1d0
         __x64_sys_ftruncate+0x1c/0x20
         do_syscall_64+0x44/0xa0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        Modules linked in: xhci_pci xhci_hcd virtio_net net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p netfs 9pnet
        CR2: ffffa0a889277028
      
      [lkp@intel.com: secretmem_iops can be static]
      Signed-off-by: default avatarkernel test robot <lkp@intel.com>
      [axelrasmussen@google.com: return EINVAL]
      
      Link: https://lkml.kernel.org/r/20220324210909.1843814-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20220412193023.279320-1-axelrasmussen@google.comSigned-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: kernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9b141f9
    • Hugh Dickins's avatar
      tmpfs: fix regressions from wider use of ZERO_PAGE · 1bdec44b
      Hugh Dickins authored
      Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing
      when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change.
      
      Whilst nfsd_splice_action() does contain some questionable handling of
      repeated pages, and Chuck was able to work around there, history from
      Mark Hemment makes clear that there might be similar dangers elsewhere:
      it was not a good idea for me to pass ZERO_PAGE down to unknown actors.
      
      Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when
      iter_is_iovec(); in other cases, use the more natural iov_iter_zero()
      instead of copy_page_to_iter().
      
      We would use iov_iter_zero() throughout, but the x86 clear_user() is not
      nearly so well optimized as copy to user (dd of 1T sparse tmpfs file
      takes 57 seconds rather than 44 seconds).
      
      And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)):
      which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards
      
      Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com
      Fixes: 56a8c8eb ("tmpfs: do not allocate pages on read")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarPatrice CHOTARD <patrice.chotard@foss.st.com>
      Reported-by: default avatarChuck Lever III <chuck.lever@oracle.com>
      Tested-by: default avatarChuck Lever III <chuck.lever@oracle.com>
      Cc: Mark Hemment <markhemm@googlemail.com>
      Cc: Patrice CHOTARD <patrice.chotard@foss.st.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bdec44b
    • Joe Perches's avatar
      MAINTAINERS: Broadcom internal lists aren't maintainers · 7fbd166a
      Joe Perches authored
      Convert the broadcom internal list M: and L: entries to R: as exploder
      email addresses are neither maintainers nor mailing lists.
      
      Reorder the entries as necessary.
      
      Link: https://lkml.kernel.org/r/04eb301f5b3adbefdd78e76657eff0acb3e3d87f.camel@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fbd166a
    • Shin'ichiro Kawasaki's avatar
      dm: fix bio length of empty flush · 92b914e2
      Shin'ichiro Kawasaki authored
      The commit 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
      removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
      available. In this case, ci->bio is not copied to ci->io->tio.clone.
      This is fine since init_clone_info() sets same values to ci->bio and
      ci->io->tio.clone.
      
      However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
      prepares a zero length bio on stack and set it to ci->bio. At this time,
      ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
      this ci->io->tio.clone as the bio to map, it is passed to targets as
      non-empty flush bio. It causes bio length check failure in dm-zoned and
      unexpected operation such as dm_accept_partial_bio() call.
      
      To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
      in __send_empty_flush().
      
      Fixes: 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      92b914e2
    • Linus Torvalds's avatar
      Merge tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-block · fb649bda
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Moving of lower_48_bits() to the block layer and a fix for the
         unaligned_be48 added with that originally (Alexander, Keith)
      
       - Fix a bad WARN_ON() for trim size checking (Ming)
      
       - A polled IO timeout fix for null_blk (Ming)
      
       - Silence IO error printing for dead disks (Christoph)
      
       - Compat mode range fix (Khazhismel)
      
       - NVMe pull request via Christoph:
           - Tone down the error logging added this merge window a bit
             (Chaitanya Kulkarni)
           - Quirk devices with non-unique unique identifiers (Christoph)
      
      * tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-block:
        block: don't print I/O error warning for dead disks
        block/compat_ioctl: fix range check in BLKGETSIZE
        nvme-pci: disable namespace identifiers for Qemu controllers
        nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
        nvme: add a quirk to disable namespace identifiers
        nvme: don't print verbose errors for internal passthrough requests
        block: null_blk: end timed out poll request
        block: fix offset/size check in bio_trim()
        asm-generic: fix __get_unaligned_be48() on 32 bit platforms
        block: move lower_48_bits() to block
      fb649bda
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block · 0647b9cc
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Ensure we check and -EINVAL any use of reserved or struct padding.
      
         Although we generally always do that, it's missed in two spots for
         resource updates, one for the ring fd registration from this merge
         window, and one for the extended arg. Make sure we have all of them
         handled. (Dylan)
      
       - A few fixes for the deferred file assignment (me, Pavel)
      
       - Add a feature flag for the deferred file assignment so apps can tell
         we handle it correctly (me)
      
       - Fix a small perf regression with the current file position fix in
         this merge window (me)
      
      * tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block:
        io_uring: abort file assignment prior to assigning creds
        io_uring: fix poll error reporting
        io_uring: fix poll file assign deadlock
        io_uring: use right issue_flags for splice/tee
        io_uring: verify pad field is 0 in io_get_ext_arg
        io_uring: verify resv is 0 in ringfd register/unregister
        io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
        io_uring: move io_uring_rsrc_update2 validation
        io_uring: fix assign file locking issue
        io_uring: stop using io_wq_work as an fd placeholder
        io_uring: move apoll->events cache
        io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
        io_uring: flag the fact that linked file assignment is sane
      0647b9cc
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-5.18-rc3' of... · bb34e0db
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fixes from Shuah Khan:
       "A mqueue perf test memory leak bug fix.
      
        mq_perf_tests failed to call CPU_FREE to free memory allocated by
        CPU_SET"
      
      * tag 'linux-kselftest-fixes-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
      bb34e0db
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.18-2022-04-14' of... · e2dec488
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.18-2022-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - 'perf record --per-thread' mode doesn't have the CPU mask setup, so
         it can use it to figure out the number of mmaps, fix it.
      
       - Fix segfault accessing sample_id xyarray out of bounds, noticed while
         using Intel PT where we have a dummy event to capture text poke perf
         metadata events and we mixup the set of CPUs specified by the user
         with the all CPUs map needed for text poke.
      
       - Fix 'perf bench numa' to check if CPU used to bind task is online.
      
       - Fix 'perf bench numa' usage of affinity for machines with more than
         1000 CPUs.
      
       - Fix misleading add event PMU debug message, noticed while using the
        'intel_pt' PMU.
      
       - Fix error check return value of hashmap__new() in 'perf stat', it
         must use IS_ERR().
      
      * tag 'perf-tools-fixes-for-v5.18-2022-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K
        perf bench: Fix numa testcase to check if CPU used to bind task is online
        perf record: Fix per-thread option
        perf tools: Fix segfault accessing sample_id xyarray
        perf stat: Fix error check return value of hashmap__new(), must use IS_ERR()
        perf tools: Fix misleading add event PMU debug message
      e2dec488
    • Jens Axboe's avatar
      Merge tag 'nvme-5.18-2022-04-15' of git://git.infradead.org/nvme into block-5.18 · 89a2ee91
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 5.18
      
       - tone down the error logging added this merge window a bit
         (Chaitanya Kulkarni)
       - quirk devices with non-unique unique identifiers (me)"
      
      * tag 'nvme-5.18-2022-04-15' of git://git.infradead.org/nvme:
        nvme-pci: disable namespace identifiers for Qemu controllers
        nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
        nvme: add a quirk to disable namespace identifiers
        nvme: don't print verbose errors for internal passthrough requests
      89a2ee91
    • Christoph Hellwig's avatar
      block: don't print I/O error warning for dead disks · 3d973a76
      Christoph Hellwig authored
      When a disk has been marked dead, don't print warnings for I/O errors
      as they are very much expected.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220323163815.1526998-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3d973a76
    • Khazhismel Kumykov's avatar
      block/compat_ioctl: fix range check in BLKGETSIZE · ccf16413
      Khazhismel Kumykov authored
      kernel ulong and compat_ulong_t may not be same width. Use type directly
      to eliminate mismatches.
      
      This would result in truncation rather than EFBIG for 32bit mode for
      large disks.
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarKhazhismel Kumykov <khazhy@google.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ccf16413
    • Christoph Hellwig's avatar
      nvme-pci: disable namespace identifiers for Qemu controllers · 66dd346b
      Christoph Hellwig authored
      Qemu unconditionally reports a UUID, which depending on the qemu version
      is either all-null (which is incorrect but harmless) or contains a single
      bit set for all controllers.  In addition it can also optionally report
      a eui64 which needs to be manually set.  Disable namespace identifiers
      for Qemu controlles entirely even if in some cases they could be set
      correctly through manual intervention.
      Reported-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      66dd346b
    • Christoph Hellwig's avatar
      nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202 · a98a945b
      Christoph Hellwig authored
      The MAXIO MAP1002/1202 controllers reports completely bogus Namespace
      identifiers that even change after suspend cycles.  Disable using
      the Identifiers entirely.
      Reported-by: default avatar金韬 <me@kingtous.cn>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Tested-by: default avatar金韬 <me@kingtous.cn>
      a98a945b
    • Christoph Hellwig's avatar
      nvme: add a quirk to disable namespace identifiers · 00ff400e
      Christoph Hellwig authored
      Add a quirk to disable using and exporting namespace identifiers for
      controllers where they are broken beyond repair.
      
      The most directly visible problem with non-unique namespace identifiers
      is that they break the /dev/disk/by-id/ links, with the link for a
      supposedly unique identifier now pointing to one of multiple possible
      namespaces that share the same ID, and a somewhat random selection of
      which one actually shows up.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      00ff400e
    • Chaitanya Kulkarni's avatar
      nvme: don't print verbose errors for internal passthrough requests · b42b6f44
      Chaitanya Kulkarni authored
      Use the RQF_QUIET flag to skip the newly added verbose error reporting,
      and set the flag in __nvme_submit_sync_cmd, which is used for most
      internal passthrough requests where we do expect errors (e.g. due to
      probing for optional functionality).  This is similar to what the SCSI
      verbose error logging does.
      Signed-off-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarAlan Adamson <alan.adamson@oracle.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Tested-by: default avatarAlan Adamson <alan.adamson@oracle.com>
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      b42b6f44
    • Jens Axboe's avatar
      io_uring: abort file assignment prior to assigning creds · 70152140
      Jens Axboe authored
      We need to either restore creds properly if we fail on the file
      assignment, or just do the file assignment first instead. Let's do
      the latter as it's simpler, should make no difference here for
      file assignment.
      
      Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/
      Reported-by: syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com
      Fixes: 6bf9c47a ("io_uring: defer file assignment")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      70152140
    • Mike Snitzer's avatar
      dm: allow dm_accept_partial_bio() for dm_io without duplicate bios · 7dd06a25
      Mike Snitzer authored
      The intent behind commit e6fc9f62 ("dm: flag clones created by
      __send_duplicate_bios") was to formally disallow the use of
      dm_accept_partial_bio() where it simply isn't possible -- due to
      constraint that multiple bios cannot meaningfully update a shared
      tio->len_ptr.
      
      But that commit went too far and disallowed the case where "abormal"
      IO (e.g. WRITE_ZEROES) is only using a single bio.  Fix this by
      not marking a dm_io with a single dm_target_io (and bio), that happens
      to be created by __send_duplicate_bios, as DM_TIO_IS_DUPLICATE_BIO.
      Also remove 'unsigned *len' parameter from alloc_multiple_bios().
      
      This commit fixes a dm_accept_partial_bio() BUG_ON() with dm-zoned
      when a WRITE_ZEROES bio is issued.
      
      Fixes: 655f3aad ("dm: switch dm_target_io booleans over to proper flags")
      Reported-by: default avatarShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      7dd06a25
  2. 14 Apr, 2022 10 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-04-15' of git://anongit.freedesktop.org/drm/drm · 028192fe
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Eggs season holidays are among us, and I think I'd expect some smaller
        pulls for two weeks then.
      
        This seems eerily quiet. One i915 fix, amdgpu has a bunch and msm. I
        didn't see a misc pull this week, so I expect that will catch up next
        week.
      
        i915:
         - Correct legacy mmap disabling to use GRAPHICS_VER_FULL
      
        msm:
         - system suspend fix
         - kzalloc return checks
         - misc display fix
         - iommu_present removal
      
        amdgpu:
         - Fix for alpha properly in pre-multiplied mode
         - Fix VCN 3.1.2 firmware name
         - Suspend/resume fix
         - Add a gfxoff quirk for Mac vega20 board
         - DCN 3.1.6 spread spectrum fix"
      
      * tag 'drm-fixes-2022-04-15' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/display: remove dtbclk_ss compensation for dcn316
        drm/amdgpu: Enable gfxoff quirk on MacBook Pro
        drm/amdgpu: Ensure HDA function is suspended before ASIC reset
        drm/amdgpu: fix VCN 3.1.2 firmware name
        drm/amd/display: don't ignore alpha property on pre-multiplied mode
        drm/msm/gpu: Avoid -Wunused-function with !CONFIG_PM_SLEEP
        drm/msm/dp: add fail safe mode outside of event_mutex context
        drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
        drm/msm: Stop using iommu_present()
        drm/msm/mdp5: check the return of kzalloc()
        drm/msm: Fix range size vs end confusion
        drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL
        drm/msm/dpu: Use indexed array initializer to prevent mismatches
        drm/msm/disp: check the return value of kzalloc()
        dt-bindings: display/msm: another fix for the dpu-qcm2290 example
        drm/msm: Add missing put_task_struct() in debugfs path
        drm/msm/gpu: Remove mutex from wait_event condition
        drm/msm/gpu: Park scheduler threads for system suspend
        drm/msm/gpu: Rename runtime suspend/resume functions
      028192fe
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfio · 38a5e3fb
      Linus Torvalds authored
      Pull vfio fix from Alex Williamson:
      
       - Fix VF token checking for vfio-pci variant drivers (Jason Gunthorpe)
      
      * tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfio:
        vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
      38a5e3fb
    • Linus Torvalds's avatar
      Merge tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 62345e48
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - two fixes related to unmount
      
       - symlink overflow fix
      
       - minor netfs fix
      
       - improved tracing for crediting (flow control)
      
      * tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: verify that tcon is valid before dereference in cifs_kill_sb
        cifs: potential buffer overflow in handling symlinks
        cifs: Split the smb3_add_credits tracepoint
        cifs: release cached dentries only if mount is complete
        cifs: Check the IOCB_DIRECT flag, not O_DIRECT
      62345e48
    • NeilBrown's avatar
      VFS: filename_create(): fix incorrect intent. · b3d4650d
      NeilBrown authored
      When asked to create a path ending '/', but which is not to be a
      directory (LOOKUP_DIRECTORY not set), filename_create() will never try
      to create the file.  If it doesn't exist, -ENOENT is reported.
      
      However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
      ->lookup() function, even though there is no intent to create.  This is
      misleading and can cause incorrect behaviour.
      
      If you try
      
         ln -s foo /path/dir/
      
      where 'dir' is a directory on an NFS filesystem which is not currently
      known in the dcache, this will fail with ENOENT.
      
      But as the name is not in the dcache, nfs_lookup gets called with
      LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
      lookup, with the expectation that a subsequent call to create the target
      will be made, and the lookup can be combined with the creation.  In the
      case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
      made.  Instead filename_create() sees that the dentry is not (yet)
      positive and returns -ENOENT - even though the directory actually
      exists.
      
      So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
      create, and use the absence of these flags to decide if -ENOENT should
      be returned.
      
      Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
      split that out and store it in 'reval_flag'.  __lookup_hash() then gets
      reval_flag combined with whatever create flags were determined to be
      needed.
      Reviewed-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3d4650d
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.18-2022-04-13' of... · 8e401ff5
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.18-2022-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      
      amd-drm-fixes-5.18-2022-04-13:
      
      amdgpu:
      - Fix for alpha properly in pre-multiplied mode
      - Fix VCN 3.1.2 firmware name
      - Suspend/resume fix
      - Add a gfxoff quirk for Mac vega20 board
      - DCN 3.1.6 spread spectrum fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220414025821.5811-1-alexander.deucher@amd.com
      8e401ff5
    • Linus Torvalds's avatar
      Merge tag 's390-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 115acbb5
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - Convert current_stack_pointer to a register alias like it is assumed
         if ARCH_HAS_CURRENT_STACK_POINTER is selected. The existing
         implementation as a function breaks CONFIG_HARDENED_USERCOPY
         sanity-checks
      
       - Get rid of -Warray-bounds warning within kexec code
      
       - Add minimal IBM z16 support by reporting a proper elf platform, and
         adding compile options
      
       - Update defconfigs
      
      * tag 's390-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: enable CONFIG_HARDENED_USERCOPY in debug_defconfig
        s390: current_stack_pointer shouldn't be a function
        s390: update defconfigs
        s390/kexec: silence -Warray-bounds warning
        s390: allow to compile with z16 optimizations
        s390: add z16 elf platform
      115acbb5
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d20339fa
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - smc: fix af_ops of child socket pointing to released memory
      
         - wifi: ath9k: fix usage of driver-private space in tx_info
      
        Previous releases - regressions:
      
         - ipv6: fix panic when forwarding a pkt with no in6 dev
      
         - sctp: use the correct skb for security_sctp_assoc_request
      
         - smc: fix NULL pointer dereference in smc_pnet_find_ib()
      
         - sched: fix initialization order when updating chain 0 head
      
         - phy: don't defer probe forever if PHY IRQ provider is missing
      
         - dsa: revert "net: dsa: setup master before ports"
      
         - dsa: felix: fix tagging protocol changes with multiple CPU ports
      
         - eth: ice:
            - fix use-after-free when freeing @rx_cpu_rmap
            - revert "iavf: fix deadlock occurrence during resetting VF
              interface"
      
         - eth: lan966x: stop processing the MAC entry is port is wrong
      
        Previous releases - always broken:
      
         - sched:
            - flower: fix parsing of ethertype following VLAN header
            - taprio: check if socket flags are valid
      
         - nfc: add flush_workqueue to prevent uaf
      
         - veth: ensure eth header is in skb's linear part
      
         - eth: stmmac: fix altr_tse_pcs function when using a fixed-link
      
         - eth: macb: restart tx only if queue pointer is lagging
      
         - eth: macvlan: fix leaking skb in source mode with nodst option"
      
      * tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
        rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
        net: dsa: felix: fix tagging protocol changes with multiple CPU ports
        tun: annotate access to queue->trans_start
        nfc: nci: add flush_workqueue to prevent uaf
        net: dsa: realtek: don't parse compatible string for RTL8366S
        net: dsa: realtek: fix Kconfig to assure consistent driver linkage
        net: ftgmac100: access hardware register after clock ready
        Revert "net: dsa: setup master before ports"
        macvlan: Fix leaking skb in source mode with nodst option
        netfilter: nf_tables: nft_parse_register can return a negative value
        net: lan966x: Stop processing the MAC entry is port is wrong.
        net: lan966x: Fix when a port's upper is changed.
        net: lan966x: Fix IGMP snooping when frames have vlan tag
        net: lan966x: Update lan966x_ptp_get_nominal_value
        sctp: Initialize daddr on peeled off socket
        net/smc: Fix af_ops of child socket pointing to released memory
        net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
        net/smc: use memcpy instead of snprintf to avoid out of bounds read
        net: macb: Restart tx only if queue pointer is lagging
        ...
      d20339fa
    • Linus Torvalds's avatar
      Merge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b9b4c79e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became an unexpectedly large pull request due to various
        regression fixes in the previous kernels.
      
        The majority of fixes are a series of patches to address the
        regression at probe errors in devres'ed drivers, while there are yet
        more fixes for the x86 SG allocations and for USB-audio buffer
        management. In addition, a few HD-audio quirks and other small fixes
        are found"
      
      * tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
        ALSA: usb-audio: Limit max buffer and period sizes per time
        ALSA: memalloc: Add fallback SG-buffer allocations for x86
        ALSA: nm256: Don't call card private_free at probe error path
        ALSA: mtpav: Don't call card private_free at probe error path
        ALSA: rme9652: Fix the missing snd_card_free() call at probe error
        ALSA: hdspm: Fix the missing snd_card_free() call at probe error
        ALSA: hdsp: Fix the missing snd_card_free() call at probe error
        ALSA: oxygen: Fix the missing snd_card_free() call at probe error
        ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
        ALSA: cmipci: Fix the missing snd_card_free() call at probe error
        ALSA: aw2: Fix the missing snd_card_free() call at probe error
        ALSA: als300: Fix the missing snd_card_free() call at probe error
        ALSA: lola: Fix the missing snd_card_free() call at probe error
        ALSA: bt87x: Fix the missing snd_card_free() call at probe error
        ALSA: sis7019: Fix the missing error handling
        ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
        ALSA: via82xx: Fix the missing snd_card_free() call at probe error
        ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
        ALSA: rme96: Fix the missing snd_card_free() call at probe error
        ALSA: rme32: Fix the missing snd_card_free() call at probe error
        ...
      b9b4c79e
    • Linus Torvalds's avatar
      Merge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 722985e2
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few more code and warning fixes.
      
        There's one feature ioctl removal patch slated for 5.18 that did not
        make it to the main pull request. It's just a one-liner and the ioctl
        has a v2 that's in use for a long time, no point to postpone it to
        5.19.
      
        Late update:
      
         - remove balance v1 ioctl, superseded by v2 in 2012
      
        Fixes:
      
         - add back cgroup attribution for compressed writes
      
         - add super block write start/end annotations to asynchronous balance
      
         - fix root reference count on an error handling path
      
         - in zoned mode, activate zone at the chunk allocation time to avoid
           ENOSPC due to timing issues
      
         - fix delayed allocation accounting for direct IO
      
        Warning fixes:
      
         - simplify assertion condition in zoned check
      
         - remove an unused variable"
      
      * tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix btrfs_submit_compressed_write cgroup attribution
        btrfs: fix root ref counts in error handling in btrfs_get_root_ref
        btrfs: zoned: activate block group only for extent allocation
        btrfs: return allocated block group from do_chunk_alloc()
        btrfs: mark resumed async balance as writing
        btrfs: remove support of balance v1 ioctl
        btrfs: release correct delalloc amount in direct IO write path
        btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
        btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
      722985e2
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20220413' of... · ec9c57a7
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache fixes from David Howells:
       "Here's a collection of fscache and cachefiles fixes and misc small
        cleanups. The two main fixes are:
      
         - Add a missing unmark of the inode in-use mark in an error path.
      
         - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
           cachefiles volume due to the wrong length being given to memcpy().
      
        In addition, there's the removal of an unused parameter, removal of an
        unused Kconfig option, conditionalising a bit of procfs-related stuff
        and some doc fixes"
      
      * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        fscache: remove FSCACHE_OLD_API Kconfig option
        fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
        fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
        fscache: Remove the cookie parameter from fscache_clear_page_bits()
        docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
        docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
        cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
        cachefiles: unmark inode in use in error path
      ec9c57a7