1. 18 Jun, 2021 23 commits
  2. 17 Jun, 2021 5 commits
  3. 16 Jun, 2021 12 commits
    • Yifan Zhang's avatar
      drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell. · 1c0b0efd
      Yifan Zhang authored
      If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
      Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      1c0b0efd
    • Yifan Zhang's avatar
      drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue. · 4cbbe348
      Yifan Zhang authored
      If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
      Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      4cbbe348
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 70585216
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "18 patches.
      
        Subsystems affected by this patch series: mm (memory-failure, swap,
        slub, hugetlb, memory-failure, slub, thp, sparsemem), and coredump"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/sparse: fix check_usemap_section_nr warnings
        mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
        mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
        mm/thp: fix page_address_in_vma() on file THP tails
        mm/thp: fix vma_address() if virtual address below file offset
        mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
        mm/thp: make is_huge_zero_pmd() safe and quicker
        mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
        mm, thp: use head page in __migration_entry_wait()
        mm/slub.c: include swab.h
        crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
        mm/memory-failure: make sure wait for page writeback in memory_failure
        mm/hugetlb: expand restore_reserve_on_error functionality
        mm/slub: actually fix freelist pointer vs redzoning
        mm/slub: fix redzoning for small allocations
        mm/slub: clarify verification reporting
        mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare
        mm,hwpoison: fix race with hugetlb page allocation
      70585216
    • Miles Chen's avatar
      mm/sparse: fix check_usemap_section_nr warnings · ccbd6283
      Miles Chen authored
      I see a "virt_to_phys used for non-linear address" warning from
      check_usemap_section_nr() on arm64 platforms.
      
      In current implementation of NODE_DATA, if CONFIG_NEED_MULTIPLE_NODES=y,
      pglist_data is dynamically allocated and assigned to node_data[].
      
      For example, in arch/arm64/include/asm/mmzone.h:
      
        extern struct pglist_data *node_data[];
        #define NODE_DATA(nid)          (node_data[(nid)])
      
      If CONFIG_NEED_MULTIPLE_NODES=n, pglist_data is defined as a global
      variable named "contig_page_data".
      
      For example, in include/linux/mmzone.h:
      
        extern struct pglist_data contig_page_data;
        #define NODE_DATA(nid)          (&contig_page_data)
      
      If CONFIG_DEBUG_VIRTUAL is not enabled, __pa() can handle both
      dynamically allocated linear addresses and symbol addresses.  However,
      if (CONFIG_DEBUG_VIRTUAL=y && CONFIG_NEED_MULTIPLE_NODES=n) we can see
      the "virt_to_phys used for non-linear address" warning because that
      &contig_page_data is not a linear address on arm64.
      
      Warning message:
      
        virt_to_phys used for non-linear address: (contig_page_data+0x0/0x1c00)
        WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x58/0x68
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Tainted: G        W         5.13.0-rc1-00074-g1140ab59 #3
        Hardware name: linux,dummy-virt (DT)
        pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
        Call trace:
           __virt_to_phys+0x58/0x68
           check_usemap_section_nr+0x50/0xfc
           sparse_init_nid+0x1ac/0x28c
           sparse_init+0x1c4/0x1e0
           bootmem_init+0x60/0x90
           setup_arch+0x184/0x1f0
           start_kernel+0x78/0x488
      
      To fix it, create a small function to handle both translation.
      
      Link: https://lkml.kernel.org/r/1623058729-27264-1-git-send-email-miles.chen@mediatek.comSigned-off-by: default avatarMiles Chen <miles.chen@mediatek.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Kazu <k-hagio-ab@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ccbd6283
    • Yang Shi's avatar
      mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split · 504e070d
      Yang Shi authored
      When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
      fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
      it may miss the failure when head page is unmapped but other subpage is
      mapped.  Then the second DEBUG_VM BUG() that check total mapcount would
      catch it.  This may incur some confusion.
      
      As this is not a fatal issue, so consolidate the two DEBUG_VM checks
      into one VM_WARN_ON_ONCE_PAGE().
      
      [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
      
      Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.comSigned-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      504e070d
    • Hugh Dickins's avatar
      mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() · 22061a1f
      Hugh Dickins authored
      There is a race between THP unmapping and truncation, when truncate sees
      pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
      it, but before its page_remove_rmap() gets to decrement
      compound_mapcount: generating false "BUG: Bad page cache" reports that
      the page is still mapped when deleted.  This commit fixes that, but not
      in the way I hoped.
      
      The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
      instead of unmap_mapping_range() in truncate_cleanup_page(): it has
      often been an annoyance that we usually call unmap_mapping_range() with
      no pages locked, but there apply it to a single locked page.
      try_to_unmap() looks more suitable for a single locked page.
      
      However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
      it is used to insert THP migration entries, but not used to unmap THPs.
      Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
      needs are different, I'm too ignorant of the DAX cases, and couldn't
      decide how far to go for anon+swap.  Set that aside.
      
      The second attempt took a different tack: make no change in truncate.c,
      but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
      clearing it initially, then pmd_clear() between page_remove_rmap() and
      unlocking at the end.  Nice.  But powerpc blows that approach out of the
      water, with its serialize_against_pte_lookup(), and interesting pgtable
      usage.  It would need serious help to get working on powerpc (with a
      minor optimization issue on s390 too).  Set that aside.
      
      Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
      delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
      that's likely to reduce or eliminate the number of incidents, it would
      give less assurance of whether we had identified the problem correctly.
      
      This successful iteration introduces "unmap_mapping_page(page)" instead
      of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
      with an addition to details.  Then zap_pmd_range() watches for this
      case, and does spin_unlock(pmd_lock) if so - just like
      page_vma_mapped_walk() now does in the PVMW_SYNC case.  Not pretty, but
      safe.
      
      Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
      assert its interface; but currently that's only used to make sure that
      page->mapping is stable, and zap_pmd_range() doesn't care if the page is
      locked or not.  Along these lines, in invalidate_inode_pages2_range()
      move the initial unmap_mapping_range() out from under page lock, before
      then calling unmap_mapping_page() under page lock if still mapped.
      
      Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
      Fixes: fc127da0 ("truncate: handle file thp")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22061a1f
    • Jue Wang's avatar
      mm/thp: fix page_address_in_vma() on file THP tails · 31657170
      Jue Wang authored
      Anon THP tails were already supported, but memory-failure may need to
      use page_address_in_vma() on file THP tails, which its page->mapping
      check did not permit: fix it.
      
      hughd adds: no current usage is known to hit the issue, but this does
      fix a subtle trap in a general helper: best fixed in stable sooner than
      later.
      
      Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com
      Fixes: 800d8c63 ("shmem: add huge pages support")
      Signed-off-by: default avatarJue Wang <juew@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31657170
    • Hugh Dickins's avatar
      mm/thp: fix vma_address() if virtual address below file offset · 494334e4
      Hugh Dickins authored
      Running certain tests with a DEBUG_VM kernel would crash within hours,
      on the total_mapcount BUG() in split_huge_page_to_list(), while trying
      to free up some memory by punching a hole in a shmem huge page: split's
      try_to_unmap() was unable to find all the mappings of the page (which,
      on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
      
      When that BUG() was changed to a WARN(), it would later crash on the
      VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
      mm/internal.h:vma_address(), used by rmap_walk_file() for
      try_to_unmap().
      
      vma_address() is usually correct, but there's a wraparound case when the
      vm_start address is unusually low, but vm_pgoff not so low:
      vma_address() chooses max(start, vma->vm_start), but that decides on the
      wrong address, because start has become almost ULONG_MAX.
      
      Rewrite vma_address() to be more careful about vm_pgoff; move the
      VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
      be safely used from page_mapped_in_vma() and page_address_in_vma() too.
      
      Add vma_address_end() to apply similar care to end address calculation,
      in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
      though it raises a question of whether callers would do better to supply
      pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
      
      An irritation is that their apparent generality breaks down on KSM
      pages, which cannot be located by the page->index that page_to_pgoff()
      uses: as commit 4b0ece6f ("mm: migrate: fix remove_migration_pte()
      for ksm pages") once discovered.  I dithered over the best thing to do
      about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
      vma_address() and vma_address_end(); though the only place in danger of
      using it on them was try_to_unmap_one().
      
      Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
      head page, instead of thp_size(): to make the right calculation on a
      hugetlbfs page, whether or not THPs are configured.  try_to_unmap() is
      used on hugetlbfs pages, but perhaps the wrong calculation never
      mattered.
      
      Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
      Fixes: a8fa41ad ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      494334e4
    • Hugh Dickins's avatar
      mm/thp: try_to_unmap() use TTU_SYNC for safe splitting · 732ed558
      Hugh Dickins authored
      Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
      (!unmap_success): with dump_page() showing mapcount:1, but then its raw
      struct page output showing _mapcount ffffffff i.e.  mapcount 0.
      
      And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
      it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
      and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
      all indicative of some mapcount difficulty in development here perhaps.
      But the !CONFIG_DEBUG_VM path handles the failures correctly and
      silently.
      
      I believe the problem is that once a racing unmap has cleared pte or
      pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
      from try_to_unmap() before the racing task has reached decrementing
      mapcount.
      
      Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
      follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
      TTU_SYNC to the options, and passing that from unmap_page().
      
      When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
      for both: the slight overhead added should rarely matter, except perhaps
      if splitting sparsely-populated multiply-mapped shmem.  Once confident
      that bugs are fixed, TTU_SYNC here can be removed, and the race
      tolerated.
      
      Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
      Fixes: fec89c10 ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      732ed558
    • Hugh Dickins's avatar
      mm/thp: make is_huge_zero_pmd() safe and quicker · 3b77e8c8
      Hugh Dickins authored
      Most callers of is_huge_zero_pmd() supply a pmd already verified
      present; but a few (notably zap_huge_pmd()) do not - it might be a pmd
      migration entry, in which the pfn is encoded differently from a present
      pmd: which might pass the is_huge_zero_pmd() test (though not on x86,
      since L1TF forced us to protect against that); or perhaps even crash in
      pmd_page() applied to a swap-like entry.
      
      Make it safe by adding pmd_present() check into is_huge_zero_pmd()
      itself; and make it quicker by saving huge_zero_pfn, so that
      is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.
      
      __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
      but is unnecessary now that is_huge_zero_pmd() checks present.
      
      Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
      Fixes: e71769ae ("mm: enable thp migration for shmem thp")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jue Wang <juew@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b77e8c8
    • Hugh Dickins's avatar
      mm/thp: fix __split_huge_pmd_locked() on shmem migration entry · 99fa8a48
      Hugh Dickins authored
      Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
      
      Here is v2 batch of long-standing THP bug fixes that I had not got
      around to sending before, but prompted now by Wang Yugui's report
      https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
      
      Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and
      they have done no harm, but have *not* fixed that issue: something more
      is needed and I have no idea of what.
      
      This patch (of 7):
      
      Stressing huge tmpfs page migration racing hole punch often crashed on
      the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
      kernel; or shortly afterwards, on a bad dereference in
      __split_huge_pmd_locked() when DEBUG_VM=n.  They forgot to allow for pmd
      migration entries in the non-anonymous case.
      
      Full disclosure: those particular experiments were on a kernel with more
      relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
      vanilla kernel: it is conceivable that stricter locking happens to avoid
      those cases, or makes them less likely; but __split_huge_pmd_locked()
      already allowed for pmd migration entries when handling anonymous THPs,
      so this commit brings the shmem and file THP handling into line.
      
      And while there: use old_pmd rather than _pmd, as in the following
      blocks; and make it clearer to the eye that the !vma_is_anonymous()
      block is self-contained, making an early return after accounting for
      unmapping.
      
      Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
      Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
      Fixes: e71769ae ("mm: enable thp migration for shmem thp")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jue Wang <juew@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99fa8a48
    • Xu Yu's avatar
      mm, thp: use head page in __migration_entry_wait() · ffc90cbb
      Xu Yu authored
      We notice that hung task happens in a corner but practical scenario when
      CONFIG_PREEMPT_NONE is enabled, as follows.
      
      Process 0                       Process 1                     Process 2..Inf
      split_huge_page_to_list
          unmap_page
              split_huge_pmd_address
                                      __migration_entry_wait(head)
                                                                    __migration_entry_wait(tail)
          remap_page (roll back)
              remove_migration_ptes
                  rmap_walk_anon
                      cond_resched
      
      Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
      copy_to_user in fstat, which will immediately fault again without
      rescheduling, and thus occupy the cpu fully.
      
      When there are too many processes performing __migration_entry_wait on
      tail page, remap_page will never be done after cond_resched.
      
      This makes __migration_entry_wait operate on the compound head page,
      thus waits for remap_page to complete, whether the THP is split
      successfully or roll back.
      
      Note that put_and_wait_on_page_locked helps to drop the page reference
      acquired with get_page_unless_zero, as soon as the page is on the wait
      queue, before actually waiting.  So splitting the THP is only prevented
      for a brief interval.
      
      Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com
      Fixes: ba988280 ("thp: add option to setup migration entries during PMD split")
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarGang Deng <gavin.dg@linux.alibaba.com>
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffc90cbb