1. 02 Oct, 2024 4 commits
    • James Clark's avatar
      perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT · 008979cc
      James Clark authored
      The linked fixes commit added an #include "dwarf-aux.h" to disasm.h
      which gets picked up in a lot of places. Without
      HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include
      errno.h to fix the following build error:
      
        In file included from util/disasm.h:8,
                       from util/annotate.h:16,
                       from builtin-top.c:23:
        util/dwarf-aux.h: In function 'die_get_var_range':
        util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function)
          183 |  return -ENOTSUP;
              |          ^~~~~~~
      
      Fixes: 782959ac ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking")
      Signed-off-by: default avatarJames Clark <james.clark@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      008979cc
    • Arnaldo Carvalho de Melo's avatar
      tools headers arm64: Sync arm64's cputype.h with the kernel sources · b9efb596
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        db0d8a84 ("arm64: errata: Enable the AC03_CPU_38 workaround for ampere1a")
      
      That makes this perf source code to be rebuilt:
      
        CC      /tmp/build/perf-tools/util/arm-spe.o
      
      The changes in the above patch add MIDR_AMPERE1A, used in arm-spe.c, so
      probably we need to add it to that array?  Or maybe we need to leave
      this for later when this is all tested on those machines?
      
        static const struct midr_range neoverse_spe[] = {
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
                {},
        };
      
      Mark Rutland recommended about arm-spe.c in a previous update to this
      file:
      
      "I would not touch this for now -- someone would have to go audit the
      TRMs to check that those other cores have the same encoding, and I think
      it'd be better to do that as a follow-up."
      
      That addresses this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: D Scott Phillips <scott@os.amperecomputing.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/lkml/ZvtFu7J-Awy2zuEJ@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b9efb596
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Cope with differences for lib/list_sort.c copy from the kernel · 36110669
      Arnaldo Carvalho de Melo authored
      With 6d74e1e3 ("tools/lib/list_sort: remove redundant code for
      cond_resched handling") we need to use the newly added hunk based
      exceptions when comparing the copy we carry in tools/lib/ to the
      original file, do it by adding the hunks that we know will be the
      expected diff.
      
      If at some point the original file is updated in other parts, then we
      should flag and check the file for update.
      Acked-by: default avatarKuan-Wei Chiu <visitorckw@gmail.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      36110669
    • Arnaldo Carvalho de Melo's avatar
      tools check_headers.sh: Add check variant that excludes some hunks · cd46ea5a
      Arnaldo Carvalho de Melo authored
      With 6d74e1e3 ("tools/lib/list_sort: remove redundant code for
      cond_resched handling") we end up with a multi-line variation in the
      merge_final() implementation, one that the simple line based exceptions
      we had so far can't cope.
      
      Thus this check has been failing:
      
        Warning: Kernel ABI header differences:
          diff -u tools/lib/list_sort.c lib/list_sort.c
      
      So add a new check routine that uses grep -vf to exclude some hunks that
      we store in the tools/perf/check-header_ignore_hunks/ directory.
      
      This first patch is just the new check routine, the next one will use it
      to check lib/list_sort.c.
      Acked-by: default avatarKuan-Wei Chiu <visitorckw@gmail.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cd46ea5a
  2. 30 Sep, 2024 7 commits
  3. 27 Sep, 2024 10 commits
    • Ian Rogers's avatar
      perf vdso: Missed put on 32-bit dsos · 424aafb6
      Ian Rogers authored
      If the dso type doesn't match then NULL is returned but the dso should
      be put first.
      
      Fixes: f649ed80 ("perf dsos: Tidy reference counting and locking")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      424aafb6
    • Arnaldo Carvalho de Melo's avatar
      Merge remote-tracking branch 'torvalds/master' into perf-tools · 52c996d3
      Arnaldo Carvalho de Melo authored
      To pick up changes in other trees that may affect perf, such as libbpf
      and in general the header files that perf has copies of, so that we can
      do the sync with the kernel sources.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      52c996d3
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of... · eee28084
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull  misc fixes from Andrew Morton:
       "19 hotfixes.  13 are cc:stable.
      
        There's a focus on fixes for the memfd_pin_folios() work which was
        added into 6.11. Apart from that, the usual shower of singleton fixes"
      
      * tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        ocfs2: fix uninit-value in ocfs2_get_block()
        zram: don't free statically defined names
        memory tiers: use default_dram_perf_ref_source in log message
        Revert "list: test: fix tests for list_cut_position()"
        kselftests: mm: fix wrong __NR_userfaultfd value
        compiler.h: specify correct attribute for .rodata..c_jump_table
        mm/damon/Kconfig: update DAMON doc URL
        mm: kfence: fix elapsed time for allocated/freed track
        ocfs2: fix deadlock in ocfs2_get_system_file_inode
        ocfs2: reserve space for inline xattr before attaching reflink tree
        mm: migrate: annotate data-race in migrate_folio_unmap()
        mm/hugetlb: simplify refs in memfd_alloc_folio
        mm/gup: fix memfd_pin_folios alloc race panic
        mm/gup: fix memfd_pin_folios hugetlb page allocation
        mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
        mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
        mm/filemap: fix filemap_get_folios_contig THP panic
        mm: make SPLIT_PTE_PTLOCKS depend on SMP
        tools: fix shared radix-tree build
      eee28084
    • Linus Torvalds's avatar
      Merge tag 'loongarch-6.12' of... · 36304006
      Linus Torvalds authored
      Merge tag 'loongarch-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch updates from Huacai Chen:
      
       - Fix objtool about do_syscall() and Clang
      
       - Enable generic CPU vulnerabilites support
      
       - Enable ACPI BGRT handling
      
       - Rework CPU feature probe from CPUCFG/IOCSR
      
       - Add ARCH_HAS_SET_MEMORY support
      
       - Add ARCH_HAS_SET_DIRECT_MAP support
      
       - Improve hardware page table walker
      
       - Simplify _percpu_read() and _percpu_write()
      
       - Add advanced extended IRQ model documentions
      
       - Some bug fixes and other small changes
      
      * tag 'loongarch-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        Docs/LoongArch: Add advanced extended IRQ model description
        LoongArch: Remove posix_types.h include from sigcontext.h
        LoongArch: Fix memleak in pci_acpi_scan_root()
        LoongArch: Simplify _percpu_read() and _percpu_write()
        LoongArch: Improve hardware page table walker
        LoongArch: Add ARCH_HAS_SET_DIRECT_MAP support
        LoongArch: Add ARCH_HAS_SET_MEMORY support
        LoongArch: Rework CPU feature probe from CPUCFG/IOCSR
        LoongArch: Enable ACPI BGRT handling
        LoongArch: Enable generic CPU vulnerabilites support
        LoongArch: Remove STACK_FRAME_NON_STANDARD(do_syscall)
        LoongArch: Set AS_HAS_THIN_ADD_SUB as y if AS_IS_LLVM
        LoongArch: Enable objtool for Clang
        objtool: Handle frame pointer related instructions
      36304006
    • Linus Torvalds's avatar
      Merge tag 'sh-for-v6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux · ec384984
      Linus Torvalds authored
      Pull sh updates from John Paul Adrian Glaubitz:
       "The first change by Gaosheng Cui removes unused declarations which
        have been obsoleted since commit 5a4053b2 ("sh: Kill off dead
        boards.") and the second by his colleague Hongbo Li replaces the use
        of the unsafe simple_strtoul() with the safer kstrtoul() function in
        the sh interrupt controller driver code"
      
      * tag 'sh-for-v6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
        sh: intc: Replace simple_strtoul() with kstrtoul()
        sh: Remove unused declarations for make_maskreg_irq() and irq_mask_register
      ec384984
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 653608c6
      Linus Torvalds authored
      Pull more xen updates from Juergen Gross:
       "A second round of Xen related changes and features:
      
         - a small fix of the xen-pciback driver for a warning issued by
           sparse
      
         - support PCI passthrough when using a PVH dom0
      
         - enable loading the kernel in PVH mode at arbitrary addresses,
           avoiding conflicts with the memory map when running as a Xen dom0
           using the host memory layout"
      
      * tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        x86/pvh: Add 64bit relocation page tables
        x86/kernel: Move page table macros to header
        x86/pvh: Set phys_base when calling xen_prepare_pvh()
        x86/pvh: Make PVH entrypoint PIC for x86-64
        xen: sync elfnote.h from xen tree
        xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
        xen/privcmd: Add new syscall to get gsi from dev
        xen/pvh: Setup gsi for passthrough device
        xen/pci: Add a function to reset device for xen
      653608c6
    • Linus Torvalds's avatar
      Merge tag 'for-6.12/dm-changes' of... · e477dba5
      Linus Torvalds authored
      Merge tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mikulas Patocka:
      
       - Misc VDO fixes
      
       - Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()
      
       - Dm-delay: Improve kernel documentation
      
       - Dm-crypt: Allow to specify the integrity key size as an option
      
       - Dm-bufio: Remove pointless NULL check
      
       - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR;
         use __assign_bit
      
       - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix
         smatch warning
      
       - Dm-integrity: Support recalculation in the 'I' mode
      
       - Revert "dm: requeue IO if mapping table not yet available"
      
       - Dm-crypt: Small refactoring to make the code more readable
      
       - Dm-cache: Remove pointless error check
      
       - Dm: Fix spelling errors
      
       - Dm-verity: Restart or panic on an I/O error if restart or panic was
         requested
      
       - Dm-verity: Fallback to platform keyring also if key in trusted
         keyring is rejected
      
      * tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
        dm verity: fallback to platform keyring also if key in trusted keyring is rejected
        dm-verity: restart or panic on an I/O error
        dm: fix spelling errors
        dm-cache: remove pointless error check
        dm vdo: handle unaligned discards correctly
        dm vdo indexer: Convert comma to semicolon
        dm-crypt: Use common error handling code in crypt_set_keyring_key()
        dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key()
        Revert "dm: requeue IO if mapping table not yet available"
        dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
        dm-integrity: support recalculation in the 'I' mode
        dm integrity: Convert comma to semicolon
        dm integrity: fix gcc 5 warning
        dm: Make use of __assign_bit() API
        dm integrity: Remove extra unlikely helper
        dm: Convert to use ERR_CAST()
        dm bufio: Remove NULL check of list_entry()
        dm-crypt: Allow to specify the integrity key size as option
        dm: Remove unused declaration and empty definition "dm_zone_map_bio"
        dm delay: enhance kernel documentation
        ...
      e477dba5
    • Linus Torvalds's avatar
      Merge tag 'ata-6.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · b6c49fca
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix a NULL pointer dereference introduced by the recent cleanups of
         the command duration limits feature handling (me)
      
       - Fix incorrect generation of the mode sense data for the
         ALL_SUB_MPAGES page (me)
      
      * tag 'ata-6.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        ata: libata-scsi: Fix ata_msense_control() CDL page reporting
        ata: libata-scsi: Fix ata_msense_control_spgt2()
      b6c49fca
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.12-rc1' of... · e5f0e38e
      Linus Torvalds authored
      Merge tag 'driver-core-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is a small set of patches for the driver core code for 6.12-rc1.
      
        This set is the one that caused the most delay on my side, due to lots
        of last-minute reports of problems in the async shutdown feature that
        was added. In the end, I've reverted all of the patches in that series
        so we are back to "normal" and the patch set is being reworked for the
        next merge window.
      
        Other than the async shutdown patches that were reverted, included in
        here are:
      
         - minor driver core cleanups
      
         - minor driver core bus and class api cleanups and simplifications
           for some callbacks
      
         - some const markings of structures
      
         - other even more minor cleanups
      
        All of these, including the last minute reverts, have been in
        linux-next, but all of the reports of problems in linux-next were
        before the reverts happened. After the reverts, all is good"
      
      * tag 'driver-core-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
        Revert "driver core: don't always lock parent in shutdown"
        Revert "driver core: separate function to shutdown one device"
        Revert "driver core: shut down devices asynchronously"
        Revert "nvme-pci: Make driver prefer asynchronous shutdown"
        Revert "driver core: fix async device shutdown hang"
        driver core: fix async device shutdown hang
        driver core: attribute_container: Remove unused functions
        driver core: Trivially simplify ((struct device_private *)curr)->device->p to @curr
        devres: Correclty strip percpu address space of devm_free_percpu() argument
        driver core: Make parameter check consistent for API cluster device_(for_each|find)_child()
        bus: fsl-mc: make fsl_mc_bus_type const
        nvme-pci: Make driver prefer asynchronous shutdown
        driver core: shut down devices asynchronously
        driver core: separate function to shutdown one device
        driver core: don't always lock parent in shutdown
        platform: Make platform_bus_type constant
        driver core: class: Check namespace relevant parameters in class_register()
        driver:base:core: Adding a "Return:" line in comment for device_link_add()
        drivers/base: Introduce device_match_t for device finding APIs
        firmware_loader: Block path traversal
        ...
      e5f0e38e
    • Al Viro's avatar
      [tree-wide] finally take no_llseek out · cb787f4a
      Al Viro authored
      no_llseek had been defined to NULL two years ago, in commit 868941b1
      ("fs: remove no_llseek")
      
      To quote that commit,
      
        At -rc1 we'll need do a mechanical removal of no_llseek -
      
        git grep -l -w no_llseek | grep -v porting.rst | while read i; do
      	sed -i '/\<no_llseek\>/d' $i
        done
      
        would do it.
      
      Unfortunately, that hadn't been done.  Linus, could you do that now, so
      that we could finally put that thing to rest? All instances are of the
      form
      	.llseek = no_llseek,
      so it's obviously safe.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb787f4a
  4. 26 Sep, 2024 19 commits
    • Joseph Qi's avatar
      ocfs2: fix uninit-value in ocfs2_get_block() · 2af148ef
      Joseph Qi authored
      syzbot reported an uninit-value BUG:
      
      BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
      ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
      do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
      mpage_readahead+0x43f/0x840 fs/mpage.c:374
      ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
      read_pages+0x193/0x1110 mm/readahead.c:160
      page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
      do_page_cache_ra mm/readahead.c:303 [inline]
      force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
      force_page_cache_readahead mm/internal.h:347 [inline]
      generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
      vfs_fadvise mm/fadvise.c:185 [inline]
      ksys_fadvise64_64 mm/fadvise.c:199 [inline]
      __do_sys_fadvise64 mm/fadvise.c:214 [inline]
      __se_sys_fadvise64 mm/fadvise.c:212 [inline]
      __x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
      x64_sys_call+0xe11/0x3ba0
      arch/x86/include/generated/asm/syscalls_64.h:222
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
      entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
      uninitialized.  So the error log will trigger the above uninit-value
      access.
      
      The error log is out-of-date since get_blocks() was removed long time ago.
      And the error code will be logged in ocfs2_extent_map_get_blocks() once
      ocfs2_get_cluster() fails, so fix this by only logging inode and block.
      
      Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
      Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.com
      Fixes: ccd979bd ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Reported-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
      Tested-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
      Cc: Heming Zhao <heming.zhao@suse.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2af148ef
    • Andrey Skvortsov's avatar
      zram: don't free statically defined names · 486fd58a
      Andrey Skvortsov authored
      When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
      default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP, so
      we need to make sure that we don't attempt to kfree() the statically
      defined compressor name.
      
      This is detected by KASAN.
      
      ==================================================================
        Call trace:
         kfree+0x60/0x3a0
         zram_destroy_comps+0x98/0x198 [zram]
         zram_reset_device+0x22c/0x4a8 [zram]
         reset_store+0x1bc/0x2d8 [zram]
         dev_attr_store+0x44/0x80
         sysfs_kf_write+0xfc/0x188
         kernfs_fop_write_iter+0x28c/0x428
         vfs_write+0x4dc/0x9b8
         ksys_write+0x100/0x1f8
         __arm64_sys_write+0x74/0xb8
         invoke_syscall+0xd8/0x260
         el0_svc_common.constprop.0+0xb4/0x240
         do_el0_svc+0x48/0x68
         el0_svc+0x40/0xc8
         el0t_64_sync_handler+0x120/0x130
         el0t_64_sync+0x190/0x198
      ==================================================================
      
      Link: https://lkml.kernel.org/r/20240923164843.1117010-1-andrej.skvortzov@gmail.com
      Fixes: 684826f8 ("zram: free secondary algorithms names")
      Signed-off-by: default avatarAndrey Skvortsov <andrej.skvortzov@gmail.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Reported-by: default avatarVenkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
      Closes: https://lore.kernel.org/lkml/57130e48-dbb6-4047-a8c7-ebf5aaea93f4@linux.vnet.ibm.com/Tested-by: default avatarVenkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
      Cc: Chris Li <chrisl@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      486fd58a
    • Huang Ying's avatar
      memory tiers: use default_dram_perf_ref_source in log message · a530bbc5
      Huang Ying authored
      Commit 3718c02d ("acpi, hmat: calculate abstract distance with HMAT")
      added a default_dram_perf_ref_source variable that was initialized but
      never used.  This causes kmemleak to report the following memory leak:
      
      unreferenced object 0xff11000225a47b60 (size 16):
        comm "swapper/0", pid 1, jiffies 4294761654
        hex dump (first 16 bytes):
          41 43 50 49 20 48 4d 41 54 00 c1 4b 7d b7 75 7c  ACPI HMAT..K}.u|
        backtrace (crc e6d0e7b2):
          [<ffffffff95d5afdb>] __kmalloc_node_track_caller_noprof+0x36b/0x440
          [<ffffffff95c276d6>] kstrdup+0x36/0x60
          [<ffffffff95dfabfa>] mt_set_default_dram_perf+0x23a/0x2c0
          [<ffffffff9ad64733>] hmat_init+0x2b3/0x660
          [<ffffffff95203cec>] do_one_initcall+0x11c/0x5c0
          [<ffffffff9ac9cfc4>] do_initcalls+0x1b4/0x1f0
          [<ffffffff9ac9d52e>] kernel_init_freeable+0x4ae/0x520
          [<ffffffff97c789cc>] kernel_init+0x1c/0x150
          [<ffffffff952aecd1>] ret_from_fork+0x31/0x70
          [<ffffffff9520b18a>] ret_from_fork_asm+0x1a/0x30
      
      This reminds us that we forget to use the performance data source
      information.  So, use the variable in the error log message to help
      identify the root cause of inconsistent performance number.
      
      Link: https://lkml.kernel.org/r/87y13mvo0n.fsf@yhuang6-desk2.ccr.corp.intel.com
      Fixes: 3718c02d ("acpi, hmat: calculate abstract distance with HMAT")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reported-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a530bbc5
    • Guenter Roeck's avatar
      Revert "list: test: fix tests for list_cut_position()" · c509f67d
      Guenter Roeck authored
      This reverts commit e620799c.
      
      The commit introduces unit test failures.
      
           Expected cur == &entries[i], but
               cur == 0000037fffadfd80
               &entries[i] == 0000037fffadfd60
           # list_test_list_cut_position: pass:0 fail:1 skip:0 total:1
           not ok 21 list_test_list_cut_position
           # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
           Expected cur == &entries[i], but
               cur == 0000037fffa9fd70
               &entries[i] == 0000037fffa9fd60
           # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
           Expected cur == &entries[i], but
               cur == 0000037fffa9fd80
               &entries[i] == 0000037fffa9fd70
      
      Revert it.
      
      Link: https://lkml.kernel.org/r/20240922150507.553814-1-linux@roeck-us.net
      Fixes: e620799c ("list: test: fix tests for list_cut_position()")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: I Hsin Cheng <richard120310@gmail.com>
      Cc: David Gow <davidgow@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c509f67d
    • Muhammad Usama Anjum's avatar
      kselftests: mm: fix wrong __NR_userfaultfd value · f30beffd
      Muhammad Usama Anjum authored
      grep -rnIF "#define __NR_userfaultfd"
      tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
      arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
      __NR_userfaultfd 374
      arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
      __NR_userfaultfd 323
      arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
      __NR_userfaultfd (__X32_SYSCALL_BIT + 323)
      arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
      __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
      arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
      __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
      include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
      
      The number is dependent on the architecture. The above data shows that:
      x86	374
      x86_64	323
      
      The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
      was included.  It makes the test to fail every time as the correct number
      of this syscall on x86_64 is 323.  Fix the header to asm/unistd.h.
      
      Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
      Fixes: a5c6bc59 ("selftests/mm: remove local __NR_* definitions")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f30beffd
    • Tiezhu Yang's avatar
      compiler.h: specify correct attribute for .rodata..c_jump_table · c5b1184d
      Tiezhu Yang authored
      Currently, there is an assembler message when generating kernel/bpf/core.o
      under CONFIG_OBJTOOL with LoongArch compiler toolchain:
      
        Warning: setting incorrect section attributes for .rodata..c_jump_table
      
      This is because the section ".rodata..c_jump_table" should be readonly,
      but there is a "W" (writable) part of the flags:
      
        $ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
        [34] .rodata..c_j[...] PROGBITS         0000000000000000  0000d2e0
             0000000000000800  0000000000000000  WA       0     0     8
      
      There is no above issue on x86 due to the generated section flag is only
      "A" (allocatable). In order to silence the warning on LoongArch, specify
      the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
      then the section attribute of ".rodata..c_jump_table" must be readonly
      in the kernel/bpf/core.o file.
      
      Before:
      
        $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
         21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                        CONTENTS, ALLOC, LOAD, RELOC, DATA
      
      After:
      
        $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
         21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                        CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
      
      By the way, AFAICT, maybe the root cause is related with the different
      compiler behavior of various archs, so to some extent this change is a
      workaround for LoongArch, and also there is no effect for x86 which is the
      only port supported by objtool before LoongArch with this patch.
      
      Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>	[6.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5b1184d
    • Diederik de Haas's avatar
      mm/damon/Kconfig: update DAMON doc URL · 6901cf55
      Diederik de Haas authored
      The old URL doesn't really work anymore and as the documentation has been
      integrated in the main kernel documentation site, change the URL to point
      to that.
      
      Link: https://lkml.kernel.org/r/20240924082331.11499-1-didi.debian@cknow.orgSigned-off-by: default avatarDiederik de Haas <didi.debian@cknow.org>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6901cf55
    • qiwu.chen's avatar
      mm: kfence: fix elapsed time for allocated/freed track · ff7f5ad7
      qiwu.chen authored
      Fix elapsed time for the allocated/freed track introduced by commit
      62e73fd8.
      
      Link: https://lkml.kernel.org/r/20240924085004.75401-1-qiwu.chen@transsion.com
      Fixes: 62e73fd8 ("mm: kfence: print the elapsed time for allocated/freed track")
      Signed-off-by: default avatarqiwu.chen <qiwu.chen@transsion.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ff7f5ad7
    • Mohammed Anees's avatar
      ocfs2: fix deadlock in ocfs2_get_system_file_inode · 7bf1823e
      Mohammed Anees authored
      syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
      
      The scenario is depicted here,
      
      	CPU0					CPU1
      lock(&ocfs2_file_ip_alloc_sem_key);
                                     lock(&osb->system_file_mutex);
                                     lock(&ocfs2_file_ip_alloc_sem_key);
      lock(&osb->system_file_mutex);
      
      The function calls which could lead to this are:
      
      CPU0
      ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
      .
      .
      .
      ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
      
      CPU1 -
      ocfs2_fill_super - lock(&osb->system_file_mutex);
      .
      .
      .
      ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
      
      This issue can be resolved by making the down_read -> down_read_try
      in the ocfs2_read_virt_blocks.
      
      [1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
      
      Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.comSigned-off-by: default avatarMohammed Anees <pvmohammedanees2003@gmail.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Reported-by: <syzbot+e0055ea09f1f5e6fabdd@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
      Tested-by: syzbot+e0055ea09f1f5e6fabdd@syzkaller.appspotmail.com
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7bf1823e
    • Gautham Ananthakrishna's avatar
      ocfs2: reserve space for inline xattr before attaching reflink tree · 5ca60b86
      Gautham Ananthakrishna authored
      One of our customers reported a crash and a corrupted ocfs2 filesystem. 
      The crash was due to the detection of corruption.  Upon troubleshooting,
      the fsck -fn output showed the below corruption
      
      [EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
      but fsck believes the largest valid value is 227.  Clamp the next record value? n
      
      The stat output from the debugfs.ocfs2 showed the following corruption
      where the "Next Free Rec:" had overshot the "Count:" in the root metadata
      block.
      
              Inode: 33080590   Mode: 0640   Generation: 2619713622 (0x9c25a856)
              FS Generation: 904309833 (0x35e6ac49)
              CRC32: 00000000   ECC: 0000
              Type: Regular   Attr: 0x0   Flags: Valid
              Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
              Extended Attributes Block: 0  Extended Attributes Inline Size: 256
              User: 0 (root)   Group: 0 (root)   Size: 281320357888
              Links: 1   Clusters: 141738
              ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
              atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
              mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
              dtime: 0x0 -- Wed Dec 31 17:00:00 1969
              Refcount Block: 2777346
              Last Extblk: 2886943   Orphan Slot: 0
              Sub Alloc Slot: 0   Sub Alloc Bit: 14
              Tree Depth: 1   Count: 227   Next Free Rec: 230
              ## Offset        Clusters       Block#
              0  0             2310           2776351
              1  2310          2139           2777375
              2  4449          1221           2778399
              3  5670          731            2779423
              4  6401          566            2780447
              .......          ....           .......
              .......          ....           .......
      
      The issue was in the reflink workfow while reserving space for inline
      xattr.  The problematic function is ocfs2_reflink_xattr_inline().  By the
      time this function is called the reflink tree is already recreated at the
      destination inode from the source inode.  At this point, this function
      reserves space for inline xattrs at the destination inode without even
      checking if there is space at the root metadata block.  It simply reduces
      the l_count from 243 to 227 thereby making space of 256 bytes for inline
      xattr whereas the inode already has extents beyond this index (in this
      case up to 230), thereby causing corruption.
      
      The fix for this is to reserve space for inline metadata at the destination
      inode before the reflink tree gets recreated. The customer has verified the
      fix.
      
      Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@oracle.com
      Fixes: ef962df0 ("ocfs2: xattr: fix inlined xattr reflink")
      Signed-off-by: default avatarGautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5ca60b86
    • Jeongjun Park's avatar
      mm: migrate: annotate data-race in migrate_folio_unmap() · 8001070c
      Jeongjun Park authored
      I found a report from syzbot [1]
      
      This report shows that the value can be changed, but in reality, the
      value of __folio_set_movable() cannot be changed because it holds the
      folio refcount.
      
      Therefore, it is appropriate to add an annotate to make KCSAN
      ignore that data-race.
      
      [1]
      
      ==================================================================
      BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
      
      write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
       page_cache_delete mm/filemap.c:153 [inline]
       __filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
       filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
       truncate_inode_folio+0x42/0x50 mm/truncate.c:178
       shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
       shmem_truncate_range mm/shmem.c:1144 [inline]
       shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
       evict+0x2f0/0x580 fs/inode.c:731
       iput_final fs/inode.c:1883 [inline]
       iput+0x42a/0x5b0 fs/inode.c:1909
       dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
       __dentry_kill+0x18b/0x4c0 fs/dcache.c:615
       dput+0x5c/0xd0 fs/dcache.c:857
       __fput+0x3fb/0x6d0 fs/file_table.c:439
       ____fput+0x1c/0x30 fs/file_table.c:459
       task_work_run+0x13a/0x1a0 kernel/task_work.c:228
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
       exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
       syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
       do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
       __folio_test_movable include/linux/page-flags.h:699 [inline]
       migrate_folio_unmap mm/migrate.c:1199 [inline]
       migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
       migrate_pages_sync mm/migrate.c:1963 [inline]
       migrate_pages+0xff1/0x1820 mm/migrate.c:2072
       do_mbind mm/mempolicy.c:1390 [inline]
       kernel_mbind mm/mempolicy.c:1533 [inline]
       __do_sys_mbind mm/mempolicy.c:1607 [inline]
       __se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
       __x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
       x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      value changed: 0xffff888127601078 -> 0x0000000000000000
      
      Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
      Fixes: 7e2a5e5a ("mm: migrate: use __folio_test_movable()")
      Signed-off-by: default avatarJeongjun Park <aha310510@gmail.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8001070c
    • Steve Sistare's avatar
      mm/hugetlb: simplify refs in memfd_alloc_folio · dc677b5f
      Steve Sistare authored
      The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
      delete the matching folio_put in memfd_pin_folios.  This also avoids
      leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
      fails.  That error path is also broken in a second way -- when its
      folio_put causes the ref to become 0, it will implicitly call
      free_huge_folio, but then the path *explicitly* calls free_huge_folio. 
      Delete the latter.
      
      This is a continuation of the fix
        "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
      
      [steven.sistare@oracle.com: remove explicit call to free_huge_folio(), per Matthew]
        Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
        Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@oracle.com
      Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Suggested-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dc677b5f
    • Steve Sistare's avatar
      mm/gup: fix memfd_pin_folios alloc race panic · ce645b9f
      Steve Sistare authored
      If memfd_pin_folios tries to create a hugetlb page, but someone else
      already did, then folio gets the value -EEXIST here:
      
              folio = memfd_alloc_folio(memfd, start_idx);
              if (IS_ERR(folio)) {
                      ret = PTR_ERR(folio);
                      if (ret != -EEXIST)
                              goto err;
      
      then on the next trip through the "while start_idx" loop we panic here:
      
              if (folio) {
                      folio_put(folio);
      
      To fix, set the folio to NULL on error.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-6-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ce645b9f
    • Steve Sistare's avatar
      mm/gup: fix memfd_pin_folios hugetlb page allocation · 9289f020
      Steve Sistare authored
      When memfd_pin_folios -> memfd_alloc_folio creates a hugetlb page, the
      index is wrong.  The subsequent call to filemap_get_folios_contig thus
      cannot find it, and fails, and memfd_pin_folios loops forever.  To fix,
      adjust the index for the huge_page_order.
      
      memfd_alloc_folio also forgets to unlock the folio, so the next touch of
      the page calls hugetlb_fault which blocks forever trying to take the lock.
      Unlock it.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9289f020
    • Steve Sistare's avatar
      mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak · 26a8ea80
      Steve Sistare authored
      memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
      if the pages were not already faulted in.  During a normal page fault,
      resv_huge_pages is consumed here:
      
      hugetlb_fault()
        alloc_hugetlb_folio()
          dequeue_hugetlb_folio_vma()
            dequeue_hugetlb_folio_nodemask()
              dequeue_hugetlb_folio_node_exact()
                free_huge_pages--
            resv_huge_pages--
      
      During memfd_pin_folios, the page is created by calling
      alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
      resv_huge_pages is not modified:
      
      memfd_alloc_folio()
        alloc_hugetlb_folio_nodemask()
          dequeue_hugetlb_folio_nodemask()
            dequeue_hugetlb_folio_node_exact()
              free_huge_pages--
      
      alloc_hugetlb_folio_nodemask has other callers that must not modify
      resv_huge_pages.  Therefore, to fix, define an alternate version of
      alloc_hugetlb_folio_nodemask for this call site that adjusts
      resv_huge_pages.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      26a8ea80
    • Steve Sistare's avatar
      mm/hugetlb: fix memfd_pin_folios free_huge_pages leak · c56b6f3d
      Steve Sistare authored
      memfd_pin_folios followed by unpin_folios fails to restore free_huge_pages
      if the pages were not already faulted in, because the folio refcount for
      pages created by memfd_alloc_folio never goes to 0.  memfd_pin_folios
      needs another folio_put to undo the folio_try_get below:
      
      memfd_alloc_folio()
        alloc_hugetlb_folio_nodemask()
          dequeue_hugetlb_folio_nodemask()
            dequeue_hugetlb_folio_node_exact()
              folio_ref_unfreeze(folio, 1);    ; adds 1 refcount
        folio_try_get()                        ; adds 1 refcount
        hugetlb_add_to_page_cache()            ; adds 512 refcount (on x86)
      
      With the fix, after memfd_pin_folios + unpin_folios, the refcount for the
      (unfaulted) page is 512, which is correct, as the refcount for a faulted
      unpinned page is 513.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-3-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c56b6f3d
    • Steve Sistare's avatar
      mm/filemap: fix filemap_get_folios_contig THP panic · c225c4f6
      Steve Sistare authored
      Patch series "memfd-pin huge page fixes".
      
      Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
      pages and THP.  The hugetlb bugs only bite when the page is not yet
      faulted in when memfd_pin_folios is called.  The THP bug bites when the
      starting offset passed to memfd_pin_folios is not huge page aligned.  See
      the commit messages for details.
      
      
      This patch (of 5):
      
      memfd_pin_folios on memory backed by THP panics if the requested start
      offset is not huge page aligned:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000036
      RIP: 0010:filemap_get_folios_contig+0xdf/0x290
      RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
      RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002
      
      The fault occurs here, because xas_load returns a folio with value 2:
      
          filemap_get_folios_contig()
              for (folio = xas_load(&xas); folio && xas.xa_index <= end;
                              folio = xas_next(&xas)) {
                      ...
                      if (!folio_try_get(folio))   <-- BOOM
      
      "2" is an xarray sibling entry.  We get it because memfd_pin_folios does
      not round the indices passed to filemap_get_folios_contig to huge page
      boundaries for THP, so we load from the middle of a huge page range see a
      sibling.  (It does round for hugetlbfs, at the is_file_hugepages test).
      
      To fix, if the folio is a sibling, then return the next index as the
      starting point for the next call to filemap_get_folios_contig.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-1-git-send-email-steven.sistare@oracle.com
      Link: https://lkml.kernel.org/r/1725373521-451395-2-git-send-email-steven.sistare@oracle.com
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c225c4f6
    • Guenter Roeck's avatar
      mm: make SPLIT_PTE_PTLOCKS depend on SMP · a3344078
      Guenter Roeck authored
      SPLIT_PTE_PTLOCKS depends on "NR_CPUS >= 4".  Unfortunately, that
      evaluates to true if there is no NR_CPUS configuration option.  This
      results in CONFIG_SPLIT_PTE_PTLOCKS=y for mac_defconfig.  This in turn
      causes the m68k "q800" and "virt" machines to crash in qemu if debugging
      options are enabled.
      
      Making CONFIG_SPLIT_PTE_PTLOCKS dependent on the existence of NR_CPUS does
      not work since a dependency on the existence of a numeric Kconfig entry
      always evaluates to false.  Example:
      
      config HAVE_NO_NR_CPUS
             def_bool y
             depends on !NR_CPUS
      
      After adding this to a Kconfig file, "make defconfig" includes:
      $ grep NR_CPUS .config
      CONFIG_NR_CPUS=64
      CONFIG_HAVE_NO_NR_CPUS=y
      
      Defining NR_CPUS for m68k does not help either since many architectures
      define NR_CPUS only for SMP configurations.
      
      Make SPLIT_PTE_PTLOCKS depend on SMP instead to solve the problem.
      
      Link: https://lkml.kernel.org/r/20240924154205.1491376-1-linux@roeck-us.net
      Fixes: 394290cb ("mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig options")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a3344078
    • Lorenzo Stoakes's avatar
      tools: fix shared radix-tree build · c234c653
      Lorenzo Stoakes authored
      The shared radix-tree build is not correctly recompiling when
      lib/maple_tree.c and lib/test_maple_tree.c are modified - fix this by
      adding these core components to the SHARED_DEPS list.
      
      Additionally, add missing header guards to shared header files.
      
      Link: https://lkml.kernel.org/r/20240924180724.112169-1-lorenzo.stoakes@oracle.com
      Fixes: 74579d8d ("tools: separate out shared radix-tree components")
      Signed-off-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Tested-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c234c653