1. 26 Mar, 2024 16 commits
    • Carlos Maiolino's avatar
      tmpfs: fix race on handling dquot rbtree · 0a69b6b3
      Carlos Maiolino authored
      A syzkaller reproducer found a race while attempting to remove dquot
      information from the rb tree.
      
      Fetching the rb_tree root node must also be protected by the
      dqopt->dqio_sem, otherwise, giving the right timing, shmem_release_dquot()
      will trigger a warning because it couldn't find a node in the tree, when
      the real reason was the root node changing before the search starts:
      
      Thread 1				Thread 2
      - shmem_release_dquot()			- shmem_{acquire,release}_dquot()
      
      - fetch ROOT				- Fetch ROOT
      
      					- acquire dqio_sem
      - wait dqio_sem
      
      					- do something, triger a tree rebalance
      					- release dqio_sem
      
      - acquire dqio_sem
      - start searching for the node, but
        from the wrong location, missing
        the node, and triggering a warning.
      
      Link: https://lkml.kernel.org/r/20240320124011.398847-1-cem@kernel.org
      Fixes: eafc474e ("shmem: prepare shmem quota infrastructure")
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reported-by: default avatarUbisectech Sirius <bugreport@ubisectech.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0a69b6b3
    • Edward Liaw's avatar
      selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM · 105840eb
      Edward Liaw authored
      The sigbus-wp test requires the UFFD_FEATURE_WP_HUGETLBFS_SHMEM flag for
      shmem and hugetlb targets.  Otherwise it is not backwards compatible with
      kernels <5.19 and fails with EINVAL.
      
      Link: https://lkml.kernel.org/r/20240321232023.2064975-1-edliaw@google.com
      Fixes: 73c1ea93 ("selftests/mm: move uffd sig/events tests into uffd unit tests")
      Signed-off-by: default avatarEdward Liaw <edliaw@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Peter Xu <peterx@redhat.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      105840eb
    • Johannes Weiner's avatar
      mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion · 30fb6a8d
      Johannes Weiner authored
      Kent forwards this bug report of zswap re-entering the block layer
      from an IO request allocation and locking up:
      
      [10264.128242] sysrq: Show Blocked State
      [10264.128268] task:kworker/20:0H   state:D stack:0     pid:143   tgid:143   ppid:2      flags:0x00004000
      [10264.128271] Workqueue: bcachefs_io btree_write_submit [bcachefs]
      [10264.128295] Call Trace:
      [10264.128295]  <TASK>
      [10264.128297]  __schedule+0x3e6/0x1520
      [10264.128303]  schedule+0x32/0xd0
      [10264.128304]  schedule_timeout+0x98/0x160
      [10264.128308]  io_schedule_timeout+0x50/0x80
      [10264.128309]  wait_for_completion_io_timeout+0x7f/0x180
      [10264.128310]  submit_bio_wait+0x78/0xb0
      [10264.128313]  swap_writepage_bdev_sync+0xf6/0x150
      [10264.128317]  zswap_writeback_entry+0xf2/0x180
      [10264.128319]  shrink_memcg_cb+0xe7/0x2f0
      [10264.128322]  __list_lru_walk_one+0xb9/0x1d0
      [10264.128325]  list_lru_walk_one+0x5d/0x90
      [10264.128326]  zswap_shrinker_scan+0xc4/0x130
      [10264.128327]  do_shrink_slab+0x13f/0x360
      [10264.128328]  shrink_slab+0x28e/0x3c0
      [10264.128329]  shrink_one+0x123/0x1b0
      [10264.128331]  shrink_node+0x97e/0xbc0
      [10264.128332]  do_try_to_free_pages+0xe7/0x5b0
      [10264.128333]  try_to_free_pages+0xe1/0x200
      [10264.128334]  __alloc_pages_slowpath.constprop.0+0x343/0xde0
      [10264.128337]  __alloc_pages+0x32d/0x350
      [10264.128338]  allocate_slab+0x400/0x460
      [10264.128339]  ___slab_alloc+0x40d/0xa40
      [10264.128345]  kmem_cache_alloc+0x2e7/0x330
      [10264.128348]  mempool_alloc+0x86/0x1b0
      [10264.128349]  bio_alloc_bioset+0x200/0x4f0
      [10264.128352]  bio_alloc_clone+0x23/0x60
      [10264.128354]  alloc_io+0x26/0xf0 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382]
      [10264.128361]  dm_submit_bio+0xb8/0x580 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382]
      [10264.128366]  __submit_bio+0xb0/0x170
      [10264.128367]  submit_bio_noacct_nocheck+0x159/0x370
      [10264.128368]  bch2_submit_wbio_replicas+0x21c/0x3a0 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2]
      [10264.128391]  btree_write_submit+0x1cf/0x220 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2]
      [10264.128406]  process_one_work+0x178/0x350
      [10264.128408]  worker_thread+0x30f/0x450
      [10264.128409]  kthread+0xe5/0x120
      
      The zswap shrinker resumes the swap_writepage()s that were intercepted
      by the zswap store. This will enter the block layer, and may even
      enter the filesystem depending on the swap backing file.
      
      Make it respect GFP_NOIO and GFP_NOFS.
      
      Link: https://lore.kernel.org/linux-mm/rc4pk2r42oyvjo4dc62z6sovquyllq56i5cdgcaqbd7wy3hfzr@n4nbxido3fme/
      Link: https://lkml.kernel.org/r/20240321182532.60000-1-hannes@cmpxchg.org
      Fixes: b5ba474f ("zswap: shrink zswap pool based on memory pressure")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Acked-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Reported-by: default avatarJérôme Poulin <jeromepoulin@gmail.com>
      Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
      Reviewed-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Cc: stable@vger.kernel.org	[v6.8]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      30fb6a8d
    • Zev Weiss's avatar
      ARM: prctl: reject PR_SET_MDWE on pre-ARMv6 · 166ce846
      Zev Weiss authored
      On v5 and lower CPUs we can't provide MDWE protection, so ensure we fail
      any attempt to enable it via prctl(PR_SET_MDWE).
      
      Previously such an attempt would misleadingly succeed, leading to any
      subsequent mmap(PROT_READ|PROT_WRITE) or execve() failing unconditionally
      (the latter somewhat violently via force_fatal_sig(SIGSEGV) due to
      READ_IMPLIES_EXEC).
      
      Link: https://lkml.kernel.org/r/20240227013546.15769-6-zev@bewilderbeest.netSigned-off-by: default avatarZev Weiss <zev@bewilderbeest.net>
      Cc: <stable@vger.kernel.org>	[6.3+]
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Russell King (Oracle) <linux@armlinux.org.uk>
      Cc: Sam James <sam@gentoo.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Cc: Yang Shi <yang@os.amperecomputing.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      166ce846
    • Zev Weiss's avatar
      prctl: generalize PR_SET_MDWE support check to be per-arch · d5aad4c2
      Zev Weiss authored
      Patch series "ARM: prctl: Reject PR_SET_MDWE where not supported".
      
      I noticed after a recent kernel update that my ARM926 system started
      segfaulting on any execve() after calling prctl(PR_SET_MDWE).  After some
      investigation it appears that ARMv5 is incapable of providing the
      appropriate protections for MDWE, since any readable memory is also
      implicitly executable.
      
      The prctl_set_mdwe() function already had some special-case logic added
      disabling it on PARISC (commit 79383813, "prctl: Disable
      prctl(PR_SET_MDWE) on parisc"); this patch series (1) generalizes that
      check to use an arch_*() function, and (2) adds a corresponding override
      for ARM to disable MDWE on pre-ARMv6 CPUs.
      
      With the series applied, prctl(PR_SET_MDWE) is rejected on ARMv5 and
      subsequent execve() calls (as well as mmap(PROT_READ|PROT_WRITE)) can
      succeed instead of unconditionally failing; on ARMv6 the prctl works as it
      did previously.
      
      [0] https://lore.kernel.org/all/2023112456-linked-nape-bf19@gregkh/
      
      
      This patch (of 2):
      
      There exist systems other than PARISC where MDWE may not be feasible to
      support; rather than cluttering up the generic code with additional
      arch-specific logic let's add a generic function for checking MDWE support
      and allow each arch to override it as needed.
      
      Link: https://lkml.kernel.org/r/20240227013546.15769-4-zev@bewilderbeest.net
      Link: https://lkml.kernel.org/r/20240227013546.15769-5-zev@bewilderbeest.netSigned-off-by: default avatarZev Weiss <zev@bewilderbeest.net>
      Acked-by: Helge Deller <deller@gmx.de>	[parisc]
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Florent Revest <revest@chromium.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Russell King (Oracle) <linux@armlinux.org.uk>
      Cc: Sam James <sam@gentoo.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Cc: Yang Shi <yang@os.amperecomputing.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: <stable@vger.kernel.org>	[6.3+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5aad4c2
    • Kuan-Wei Chiu's avatar
      MAINTAINERS: remove incorrect M: tag for dm-devel@lists.linux.dev · db09f2df
      Kuan-Wei Chiu authored
      The dm-devel@lists.linux.dev mailing list should only be listed under the
      L: (List) tag in the MAINTAINERS file.  However, it was incorrectly listed
      under both L: and M: (Maintainers) tags, which is not accurate.  Remove
      the M: tag for dm-devel@lists.linux.dev in the MAINTAINERS file to reflect
      the correct categorization.
      
      Link: https://lkml.kernel.org/r/20240319181842.249547-1-visitorckw@gmail.comSigned-off-by: default avatarKuan-Wei Chiu <visitorckw@gmail.com>
      Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
      Cc: Matthew Sakai <msakai@redhat.com>
      Cc: Michael Sclafani <dm-devel@lists.linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      db09f2df
    • Barry Song's avatar
      mm: zswap: fix kernel BUG in sg_init_one · 9c500835
      Barry Song authored
      sg_init_one() relies on linearly mapped low memory for the safe
      utilization of virt_to_page().  Otherwise, we trigger a kernel BUG,
      
      kernel BUG at include/linux/scatterlist.h:187!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 2997 Comm: syz-executor198 Not tainted 6.8.0-syzkaller #0
      Hardware name: ARM-Versatile Express
      PC is at sg_set_buf include/linux/scatterlist.h:187 [inline]
      PC is at sg_init_one+0x9c/0xa8 lib/scatterlist.c:143
      LR is at sg_init_table+0x2c/0x40 lib/scatterlist.c:128
      Backtrace:
      [<807e16ac>] (sg_init_one) from [<804c1824>] (zswap_decompress+0xbc/0x208 mm/zswap.c:1089)
       r7:83471c80 r6:def6d08c r5:844847d0 r4:ff7e7ef4
      [<804c1768>] (zswap_decompress) from [<804c4468>] (zswap_load+0x15c/0x198 mm/zswap.c:1637)
       r9:8446eb80 r8:8446eb80 r7:8446eb84 r6:def6d08c r5:00000001 r4:844847d0
      [<804c430c>] (zswap_load) from [<804b9644>] (swap_read_folio+0xa8/0x498 mm/page_io.c:518)
       r9:844ac800 r8:835e6c00 r7:00000000 r6:df955d4c r5:00000001 r4:def6d08c
      [<804b959c>] (swap_read_folio) from [<804bb064>] (swap_cluster_readahead+0x1c4/0x34c mm/swap_state.c:684)
       r10:00000000 r9:00000007 r8:df955d4b r7:00000000 r6:00000000 r5:00100cca
       r4:00000001
      [<804baea0>] (swap_cluster_readahead) from [<804bb3b8>] (swapin_readahead+0x68/0x4a8 mm/swap_state.c:904)
       r10:df955eb8 r9:00000000 r8:00100cca r7:84476480 r6:00000001 r5:00000000
       r4:00000001
      [<804bb350>] (swapin_readahead) from [<8047cde0>] (do_swap_page+0x200/0xcc4 mm/memory.c:4046)
       r10:00000040 r9:00000000 r8:844ac800 r7:84476480 r6:00000001 r5:00000000
       r4:df955eb8
      [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_pte_fault mm/memory.c:5301 [inline])
      [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (__handle_mm_fault mm/memory.c:5439 [inline])
      [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_mm_fault+0x3d8/0x12b8 mm/memory.c:5604)
       r10:00000040 r9:842b3900 r8:7eb0d000 r7:84476480 r6:7eb0d000 r5:835e6c00
       r4:00000254
      [<8047e2ec>] (handle_mm_fault) from [<80215d28>] (do_page_fault+0x148/0x3a8 arch/arm/mm/fault.c:326)
       r10:00000007 r9:842b3900 r8:7eb0d000 r7:00000207 r6:00000254 r5:7eb0d9b4
       r4:df955fb0
      [<80215be0>] (do_page_fault) from [<80216170>] (do_DataAbort+0x38/0xa8 arch/arm/mm/fault.c:558)
       r10:7eb0da7c r9:00000000 r8:80215be0 r7:df955fb0 r6:7eb0d9b4 r5:00000207
       r4:8261d0e0
      [<80216138>] (do_DataAbort) from [<80200e3c>] (__dabt_usr+0x5c/0x60 arch/arm/kernel/entry-armv.S:427)
      Exception stack(0xdf955fb0 to 0xdf955ff8)
      5fa0:                                     00000000 00000000 22d5f800 0008d158
      5fc0: 00000000 7eb0d9a4 00000000 00000109 00000000 00000000 7eb0da7c 7eb0da3c
      5fe0: 00000000 7eb0d9a0 00000001 00066bd4 00000010 ffffffff
       r8:824a9044 r7:835e6c00 r6:ffffffff r5:00000010 r4:00066bd4
      Code: 1a000004 e1822003 e8860094 e89da8f0 (e7f001f2)
      ---[ end trace 0000000000000000 ]---
      ----------------
      Code disassembly (best guess):
         0:	1a000004 	bne	0x18
         4:	e1822003 	orr	r2, r2, r3
         8:	e8860094 	stm	r6, {r2, r4, r7}
         c:	e89da8f0 	ldm	sp, {r4, r5, r6, r7, fp, sp, pc}
      * 10:	e7f001f2 	udf	#18 <-- trapping instruction
      
      Consequently, we have two choices: either employ kmap_to_page() alongside
      sg_set_page(), or resort to copying high memory contents to a temporary
      buffer residing in low memory.  However, considering the introduction of
      the WARN_ON_ONCE in commit ef6e06b2 ("highmem: fix kmap_to_page() for
      kmap_local_page() addresses"), which specifically addresses high memory
      concerns, it appears that memcpy remains the sole viable option.
      
      Link: https://lkml.kernel.org/r/20240318234706.95347-1-21cnbao@gmail.com
      Fixes: 270700dd ("mm/zswap: remove the memcpy if acomp is not sleepable")
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Reported-by: syzbot+adbc983a1588b7805de3@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/all/000000000000bbb3d80613f243a6@google.com/
      Tested-by: syzbot+adbc983a1588b7805de3@syzkaller.appspotmail.com
      Acked-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9c500835
    • Muhammad Usama Anjum's avatar
      selftests: mm: restore settings from only parent process · c52eb6db
      Muhammad Usama Anjum authored
      The atexit() is called from parent process as well as forked processes. 
      Hence the child restores the settings at exit while the parent is still
      executing.  Fix this by checking pid of atexit() calling process and only
      restore THP number from parent process.
      
      Link: https://lkml.kernel.org/r/20240314094045.157149-1-usama.anjum@collabora.com
      Fixes: c23ea617 ("selftests/mm: protection_keys: save/restore nr_hugepages settings")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Tested-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c52eb6db
    • Cong Liu's avatar
      tools/Makefile: remove cgroup target · 950bf45d
      Cong Liu authored
      The tools/cgroup directory no longer contains a Makefile.  This patch
      updates the top-level tools/Makefile to remove references to building and
      installing cgroup components.  This change reflects the current structure
      of the tools directory and fixes the build failure when building tools in
      the top-level directory.
      
      linux/tools$ make cgroup
        DESCEND cgroup
      make[1]: *** No targets specified and no makefile found.  Stop.
      make: *** [Makefile:73: cgroup] Error 2
      
      Link: https://lkml.kernel.org/r/20240315012249.439639-1-liucong2@kylinos.cnSigned-off-by: default avatarCong Liu <liucong2@kylinos.cn>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Reviewed-by: default avatarDmitry Rokosov <ddrokosov@salutedevices.com>
      Cc: Cong Liu <liucong2@kylinos.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      950bf45d
    • Johannes Weiner's avatar
      mm: cachestat: fix two shmem bugs · d5d39c70
      Johannes Weiner authored
      When cachestat on shmem races with swapping and invalidation, there
      are two possible bugs:
      
      1) A swapin error can have resulted in a poisoned swap entry in the
         shmem inode's xarray. Calling get_shadow_from_swap_cache() on it
         will result in an out-of-bounds access to swapper_spaces[].
      
         Validate the entry with non_swap_entry() before going further.
      
      2) When we find a valid swap entry in the shmem's inode, the shadow
         entry in the swapcache might not exist yet: swap IO is still in
         progress and we're before __remove_mapping; swapin, invalidation,
         or swapoff have removed the shadow from swapcache after we saw the
         shmem swap entry.
      
         This will send a NULL to workingset_test_recent(). The latter
         purely operates on pointer bits, so it won't crash - node 0, memcg
         ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a
         bogus test. In theory that could result in a false "recently
         evicted" count.
      
         Such a false positive wouldn't be the end of the world. But for
         code clarity and (future) robustness, be explicit about this case.
      
         Bail on get_shadow_from_swap_cache() returning NULL.
      
      Link: https://lkml.kernel.org/r/20240315095556.GC581298@cmpxchg.org
      Fixes: cf264e13 ("cachestat: implement cachestat syscall")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: Chengming Zhou <chengming.zhou@linux.dev>	[Bug #1]
      Reported-by: Jann Horn <jannh@google.com>		[Bug #2]
      Reviewed-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
      Cc: <stable@vger.kernel.org>				[v6.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5d39c70
    • Matthew Wilcox (Oracle)'s avatar
      mm: increase folio batch size · 9cecde80
      Matthew Wilcox (Oracle) authored
      On a 104 thread, 2 socket Skylake system, Intel report a 4.7% performance
      reduction with will-it-scale page_fault2.  This was due to reducing the
      size of the batch from 32 to 15.  Increasing the folio batch size from 15
      to 31 gives a performance increase of 12.5% relative to the original, or
      17.2% relative to the reduced performance commit.
      
      The penalty of this commit is an additional 128 bytes of stack usage.  Six
      folio_batches are also allocated from percpu memory in cpu_fbatches so
      that will be an additional 768 bytes of percpu memory (per CPU).  Tim Chen
      originally submitted a patch like this in 2020:
      https://lore.kernel.org/linux-mm/d1cc9f12a8ad6c2a52cb600d93b06b064f2bbc57.1593205965.git.tim.c.chen@linux.intel.com/
      
      Link: https://lkml.kernel.org/r/20240315140823.2478146-1-willy@infradead.org
      Fixes: 99fbb6bf ("mm: make folios_put() the basis of release_pages()")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarYujie Liu <yujie.liu@intel.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202403151058.7048f6a8-oliver.sang@intel.comSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9cecde80
    • Oscar Salvador's avatar
      mm,page_owner: fix recursion · 7844c014
      Oscar Salvador authored
      Prior to 217b2119 ("mm,page_owner: implement the tracking of the
      stacks count") the only place where page_owner could potentially go into
      recursion due to its need of allocating more memory was in save_stack(),
      which ends up calling into stackdepot code with the possibility of
      allocating memory.
      
      We made sure to guard against that by signaling that the current task was
      already in page_owner code, so in case a recursion attempt was made, we
      could catch that and return dummy_handle.
      
      After above commit, a new place in page_owner code was introduced where we
      could allocate memory, meaning we could go into recursion would we take
      that path.
      
      Make sure to signal that we are in page_owner in that codepath as well. 
      Move the guard code into two helpers {un}set_current_in_page_owner() and
      use them prior to calling in the two functions that might allocate memory.
      
      Link: https://lkml.kernel.org/r/20240315222610.6870-1-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Fixes: 217b2119 ("mm,page_owner: implement the tracking of the stacks count")
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7844c014
    • Leonard Crestez's avatar
      mailmap: update entry for Leonard Crestez · 32900324
      Leonard Crestez authored
      Put my personal email first because NXP employment ended some time ago.
      Also add my old intel email address.
      
      Link: https://lkml.kernel.org/r/f568faa0-2380-4e93-a312-b80c1e367645@gmail.comSigned-off-by: default avatarLeonard Crestez <cdleonard@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      32900324
    • John Sperbeck's avatar
      init: open /initrd.image with O_LARGEFILE · 4624b346
      John Sperbeck authored
      If initrd data is larger than 2Gb, we'll eventually fail to write to the
      /initrd.image file when we hit that limit, unless O_LARGEFILE is set.
      
      Link: https://lkml.kernel.org/r/20240317221522.896040-1-jsperbeck@google.comSigned-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4624b346
    • Vitaly Chikunov's avatar
      selftests/mm: Fix build with _FORTIFY_SOURCE · 8b65ef5a
      Vitaly Chikunov authored
      Add missing flags argument to open(2) call with O_CREAT.
      
      Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid
      value) (together with -O), resulting in similar error messages such as:
      
        In file included from /usr/include/fcntl.h:342,
                         from gup_test.c:1:
        In function 'open',
            inlined from 'main' at gup_test.c:206:10:
        /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
           50 |           __open_missing_mode ();
              |           ^~~~~~~~~~~~~~~~~~~~~~
      
      _FORTIFY_SOURCE is enabled by default in some distributions, so the
      tests are not built by default and are skipped.
      
      open(2) man-page warns about missing flags argument: "if it is not
      supplied, some arbitrary bytes from the stack will be applied as the
      file mode."
      
      Link: https://lkml.kernel.org/r/20240318023445.3192922-1-vt@altlinux.org
      Fixes: aeb85ed4 ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file")
      Fixes: fbe37501 ("mm: huge_memory: debugfs for file-backed THP split")
      Fixes: c942f5bd ("selftests: soft-dirty: add test for mprotect")
      Signed-off-by: default avatarVitaly Chikunov <vt@altlinux.org>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8b65ef5a
    • Peter Xu's avatar
      mm/memory: fix missing pte marker for !page on pte zaps · f8572367
      Peter Xu authored
      Commit 0cf18e83 of large folio zap work broke uffd-wp.  Now mm's uffd
      unit test "wp-unpopulated" will trigger this WARN_ON_ONCE().
      
      The WARN_ON_ONCE() asserts that an VMA cannot be registered with
      userfaultfd-wp if it contains a !normal page, but it's actually possible. 
      One example is an anonymous vma, register with uffd-wp, read anything will
      install a zero page.  Then when zap on it, this should trigger.
      
      What's more, removing that WARN_ON_ONCE may not be enough either, because
      we should also not rely on "whether it's a normal page" to decide whether
      pte marker is needed.  For example, one can register wr-protect over some
      DAX regions to track writes when UFFD_FEATURE_WP_ASYNC enabled, in which
      case it can have page==NULL for a devmap but we may want to keep the
      marker around.
      
      Link: https://lkml.kernel.org/r/20240313213107.235067-1-peterx@redhat.com
      Fixes: 0cf18e83 ("mm/memory: handle !page case in zap_present_pte() separately")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f8572367
  2. 24 Mar, 2024 13 commits
    • Linus Torvalds's avatar
      Linux 6.9-rc1 · 4cece764
      Linus Torvalds authored
      4cece764
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab8de2db
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Fix logic that is supposed to prevent placement of the kernel image
         below LOAD_PHYSICAL_ADDR
      
       - Use the firmware stack in the EFI stub when running in mixed mode
      
       - Clear BSS only once when using mixed mode
      
       - Check efi.get_variable() function pointer for NULL before trying to
         call it
      
      * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: fix panic in kdump kernel
        x86/efistub: Don't clear BSS twice in mixed mode
        x86/efistub: Call mixed mode boot services on the firmware's stack
        efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
      ab8de2db
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e74df2f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Ensure that the encryption mask at boot is properly propagated on
         5-level page tables, otherwise the PGD entry is incorrectly set to
         non-encrypted, which causes system crashes during boot.
      
       - Undo the deferred 5-level page table setup as it cannot work with
         memory encryption enabled.
      
       - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
         to the default value but the cached variable is not, so subsequent
         comparisons might yield the wrong result and as a consequence the
         result prevents updating the MSR.
      
       - Register the local APIC address only once in the MPPARSE enumeration
         to prevent triggering the related WARN_ONs() in the APIC and topology
         code.
      
       - Handle the case where no APIC is found gracefully by registering a
         fake APIC in the topology code. That makes all related topology
         functions work correctly and does not affect the actual APIC driver
         code at all.
      
       - Don't evaluate logical IDs during early boot as the local APIC IDs
         are not yet enumerated and the invoked function returns an error
         code. Nothing requires the logical IDs before the final CPUID
         enumeration takes place, which happens after the enumeration.
      
       - Cure the fallout of the per CPU rework on UP which misplaced the
         copying of boot_cpu_data to per CPU data so that the final update to
         boot_cpu_data got lost which caused inconsistent state and boot
         crashes.
      
       - Use copy_from_kernel_nofault() in the kprobes setup as there is no
         guarantee that the address can be safely accessed.
      
       - Reorder struct members in struct saved_context to work around another
         kmemleak false positive
      
       - Remove the buggy code which tries to update the E820 kexec table for
         setup_data as that is never passed to the kexec kernel.
      
       - Update the resource control documentation to use the proper units.
      
       - Fix a Kconfig warning observed with tinyconfig
      
      * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/64: Move 5-level paging global variable assignments back
        x86/boot/64: Apply encryption mask to 5-level pagetable update
        x86/cpu: Add model number for another Intel Arrow Lake mobile processor
        x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
        Documentation/x86: Document that resctrl bandwidth control units are MiB
        x86/mpparse: Register APIC address only once
        x86/topology: Handle the !APIC case gracefully
        x86/topology: Don't evaluate logical IDs during early boot
        x86/cpu: Ensure that CPU info updates are propagated on UP
        kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
        x86/pm: Work around false positive kmemleak report in msr_build_context()
        x86/kexec: Do not update E820 kexec table for setup_data
        x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
      5e74df2f
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b136f68e
      Linus Torvalds authored
      Pull scheduler doc clarification from Thomas Gleixner:
       "A single update for the documentation of the base_slice_ns tunable to
        clarify that any value which is less than the tick slice has no effect
        because the scheduler tick is not guaranteed to happen within the set
        time slice"
      
      * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
      b136f68e
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping · 864ad046
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
       "This has a set of swiotlb alignment fixes for sometimes very long
        standing bugs from Will. We've been discussion them for a while and
        they should be solid now"
      
      * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
        iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
        swiotlb: Fix alignment checks when both allocation and DMA masks are present
        swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
        swiotlb: Enforce page alignment in swiotlb_alloc()
        swiotlb: Fix double-allocation of slots due to broken alignment handling
      864ad046
    • Oleksandr Tymoshenko's avatar
      efi: fix panic in kdump kernel · 62b71cd7
      Oleksandr Tymoshenko authored
      Check if get_next_variable() is actually valid pointer before
      calling it. In kdump kernel this method is set to NULL that causes
      panic during the kexec-ed kernel boot.
      
      Tested with QEMU and OVMF firmware.
      
      Fixes: bad267f9 ("efi: verify that variable services are supported")
      Signed-off-by: default avatarOleksandr Tymoshenko <ovt@google.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      62b71cd7
    • Ard Biesheuvel's avatar
      x86/efistub: Don't clear BSS twice in mixed mode · df7ecce8
      Ard Biesheuvel authored
      Clearing BSS should only be done once, at the very beginning.
      efi_pe_entry() is the entrypoint from the firmware, which may not clear
      BSS and so it is done explicitly. However, efi_pe_entry() is also used
      as an entrypoint by the mixed mode startup code, in which case BSS will
      already have been cleared, and doing it again at this point will corrupt
      global variables holding the firmware's GDT/IDT and segment selectors.
      
      So make the memset() conditional on whether the EFI stub is running in
      native mode.
      
      Fixes: b3810c5a ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      df7ecce8
    • Ard Biesheuvel's avatar
      x86/efistub: Call mixed mode boot services on the firmware's stack · cefcd4fe
      Ard Biesheuvel authored
      Normally, the EFI stub calls into the EFI boot services using the stack
      that was live when the stub was entered. According to the UEFI spec,
      this stack needs to be at least 128k in size - this might seem large but
      all asynchronous processing and event handling in EFI runs from the same
      stack and so quite a lot of space may be used in practice.
      
      In mixed mode, the situation is a bit different: the bootloader calls
      the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
      entry point, where the boot stack is set up, using a fixed allocation
      of 16k. This stack is still in use when the EFI stub is started in
      64-bit mode, and so all calls back into the EFI firmware will be using
      the decompressor's limited boot stack.
      
      Due to the placement of the boot stack right after the boot heap, any
      stack overruns have gone unnoticed. However, commit
      
        5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")
      
      moved the definition of the boot heap into C code, and now the boot
      stack is placed right at the base of BSS, where any overruns will
      corrupt the end of the .data section.
      
      While it would be possible to work around this by increasing the size of
      the boot stack, doing so would affect all x86 systems, and mixed mode
      systems are a tiny (and shrinking) fraction of the x86 installed base.
      
      So instead, record the firmware stack pointer value when entering from
      the 32-bit firmware, and switch to this stack every time a EFI boot
      service call is made.
      
      Cc: <stable@kernel.org> # v6.1+
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      cefcd4fe
    • Tom Lendacky's avatar
      x86/boot/64: Move 5-level paging global variable assignments back · 9843231c
      Tom Lendacky authored
      Commit 63bed966 ("x86/startup_64: Defer assignment of 5-level paging
      global variables") moved assignment of 5-level global variables to later
      in the boot in order to avoid having to use RIP relative addressing in
      order to set them. However, when running with 5-level paging and SME
      active (mem_encrypt=on), the variables are needed as part of the page
      table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
      etc.). Since the variables haven't been set, the page table manipulation
      is done as if 4-level paging is active, causing the system to crash on
      boot.
      
      While only a subset of the assignments that were moved need to be set
      early, move all of the assignments back into check_la57_support() so that
      these assignments aren't spread between two locations. Instead of just
      reverting the fix, this uses the new RIP_REL_REF() macro when assigning
      the variables.
      
      Fixes: 63bed966 ("x86/startup_64: Defer assignment of 5-level paging global variables")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
      9843231c
    • Tom Lendacky's avatar
      x86/boot/64: Apply encryption mask to 5-level pagetable update · 4d0d7e78
      Tom Lendacky authored
      When running with 5-level page tables, the kernel mapping PGD entry is
      updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
      which, when SME is active (mem_encrypt=on), results in a page table
      entry without the encryption mask set, causing the system to crash on
      boot.
      
      Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
      that the encryption mask is set for the PGD entry.
      
      Fixes: 533568e0 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
      4d0d7e78
    • Tony Luck's avatar
    • Adamos Ttofari's avatar
      x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD · 10e4b516
      Adamos Ttofari authored
      Commit 67236547 ("x86/fpu: Update XFD state where required") and
      commit 8bf26758 ("x86/fpu: Add XFD state to fpstate") introduced a
      per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
      order to avoid unnecessary writes to the MSR.
      
      On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
      wipes out any stale state. But the per CPU cached xfd value is not
      reset, which brings them out of sync.
      
      As a consequence a subsequent xfd_update_state() might fail to update
      the MSR which in turn can result in XRSTOR raising a #NM in kernel
      space, which crashes the kernel.
      
      To fix this, introduce xfd_set_state() to write xfd_state together
      with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.
      
      Fixes: 67236547 ("x86/fpu: Update XFD state where required")
      Signed-off-by: default avatarAdamos Ttofari <attofari@amazon.de>
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com
      
      Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
      10e4b516
    • Tony Luck's avatar
      Documentation/x86: Document that resctrl bandwidth control units are MiB · a8ed59a3
      Tony Luck authored
      The memory bandwidth software controller uses 2^20 units rather than
      10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
      Linux define for 0x00100000.
      
      Update the documentation to use MiB when describing this feature.
      It's too late to fix the mount option "mba_MBps" as that is now an
      established user interface.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
      a8ed59a3
  3. 23 Mar, 2024 11 commits
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70293240
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two regression fixes for the timer and timer migration code:
      
         - Prevent endless timer requeuing which is caused by two CPUs racing
           out of idle. This happens when the last CPU goes idle and therefore
           has to ensure to expire the pending global timers and some other
           CPU come out of idle at the same time and the other CPU wins the
           race and expires the global queue. This causes the last CPU to
           chase ghost timers forever and reprogramming it's clockevent device
           endlessly.
      
           Cure this by re-evaluating the wakeup time unconditionally.
      
         - The split into local (pinned) and global timers in the timer wheel
           caused a regression for NOHZ full as it broke the idle tracking of
           global timers. On NOHZ full this prevents an self IPI being sent
           which in turn causes the timer to be not programmed and not being
           expired on time.
      
           Restore the idle tracking for the global timer base so that the
           self IPI condition for NOHZ full is working correctly again"
      
      * tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers: Fix removed self-IPI on global timer's enqueue in nohz_full
        timers/migration: Fix endless timer requeue after idle interrupts
      70293240
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 00164f47
      Linus Torvalds authored
      Pull more clocksource updates from Thomas Gleixner:
       "A set of updates for clocksource and clockevent drivers:
      
         - A fix for the prescaler of the ARM global timer where the prescaler
           mask define only covered 4 bits while it is actully 8 bits wide.
           This obviously restricted the possible range of prescaler
           adjustments
      
         - A fix for the RISC-V timer which prevents a timer interrupt being
           raised while the timer is initialized
      
         - A set of device tree updates to support new system on chips in
           various drivers
      
         - Kernel-doc and other cleanups all over the place"
      
      * tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
        dt-bindings: timer: Add support for cadence TTC PWM
        clocksource/drivers/arm_global_timer: Simplify prescaler register access
        clocksource/drivers/arm_global_timer: Guard against division by zero
        clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
        dt-bindings: timer: add Ralink SoCs system tick counter
        clocksource: arm_global_timer: fix non-kernel-doc comment
        clocksource/drivers/arm_global_timer: Remove stray tab
        clocksource/drivers/arm_global_timer: Fix maximum prescaler value
        clocksource/drivers/imx-sysctr: Add i.MX95 support
        clocksource/drivers/imx-sysctr: Drop use global variables
        dt-bindings: timer: nxp,sysctr-timer: support i.MX95
        dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
        dt-bindings: timer: renesas,tmu: Document input capture interrupt
        clocksource/drivers/ti-32K: Fix misuse of "/**" comment
        clocksource/drivers/stm32: Fix all kernel-doc warnings
        dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
        clocksource/drivers/imx: Fix -Wunused-but-set-variable warning
      00164f47
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1a391931
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A series of fixes for the Renesas RZG21 interrupt chip driver to
        prevent spurious and misrouted interrupts.
      
         - Ensure that posted writes are flushed in the eoi() callback
      
         - Ensure that interrupts are masked at the chip level when the
           trigger type is changed
      
         - Clear the interrupt status register when setting up edge type
           trigger modes
      
         - Ensure that the trigger type and routing information is set before
           the interrupt is enabled"
      
      * tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
        irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
        irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
        irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
        irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
      1a391931
    • Linus Torvalds's avatar
      Merge tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 976b029d
      Linus Torvalds authored
      Pull core entry fix from Thomas Gleixner:
       "A single fix for the generic entry code:
      
        The trace_sys_enter() tracepoint can modify the syscall number via
        kprobes or BPF in pt_regs, but that requires that the syscall number
        is re-evaluted from pt_regs after the tracepoint.
      
        A seccomp fix in that area removed the re-evaluation so the change
        does not take effect as the code just uses the locally cached number.
      
        Restore the original behaviour by re-evaluating the syscall number
        after the tracepoint"
      
      * tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        entry: Respect changes to system call number by trace_sys_enter()
      976b029d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 484193fe
      Linus Torvalds authored
      Pull more powerpc updates from Michael Ellerman:
      
       - Handle errors in mark_rodata_ro() and mark_initmem_nx()
      
       - Make struct crash_mem available without CONFIG_CRASH_DUMP
      
      Thanks to Christophe Leroy and Hari Bathini.
      
      * tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
        powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
        kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
        powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()
      484193fe
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 02fb638b
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - remove a misuse of kernel-doc comment
      
       - use "Call trace:" for backtraces like other architectures
      
       - implement copy_from_kernel_nofault_allowed() to fix a LKDTM test
      
       - add a "cut here" line for prefetch aborts
      
       - remove unnecessary Kconfing entry for FRAME_POINTER
      
       - remove iwmmxy support for PJ4/PJ4B cores
      
       - use bitfield helpers in ptrace to improve readabililty
      
       - check if folio is reserved before flushing
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
        ARM: 9354/1: ptrace: Use bitfield helpers
        ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
        ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
        ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
        ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
        ARM: 9349/1: unwind: Add missing "Call trace:" line
        ARM: 9334/1: mm: init: remove misuse of kernel-doc comment
      02fb638b
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · b7187139
      Linus Torvalds authored
      Pull more hardening updates from Kees Cook:
      
       - CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)
      
       - Fix needless UTF-8 character in arch/Kconfig (Liu Song)
      
       - Improve __counted_by warning message in LKDTM (Nathan Chancellor)
      
       - Refactor DEFINE_FLEX() for default use of __counted_by
      
       - Disable signed integer overflow sanitizer on GCC < 8
      
      * tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        lkdtm/bugs: Improve warning message for compilers without counted_by support
        overflow: Change DEFINE_FLEX to take __counted_by member
        Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
        arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
        ubsan: Disable signed integer overflow sanitizer on GCC < 8
      b7187139
    • Thomas Gleixner's avatar
      x86/mpparse: Register APIC address only once · f2208aa1
      Thomas Gleixner authored
      The APIC address is registered twice. First during the early detection and
      afterwards when actually scanning the table for APIC IDs. The APIC and
      topology core warn about the second attempt.
      
      Restrict it to the early detection call.
      
      Fixes: 81287ad6 ("x86/apic: Sanitize APIC address setup")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
      f2208aa1
    • Thomas Gleixner's avatar
      x86/topology: Handle the !APIC case gracefully · 5e25eb25
      Thomas Gleixner authored
      If there is no local APIC enumerated and registered then the topology
      bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
      a division by zero exception.
      
      Prevent this by registering a fake APIC id to populate the topology
      bitmap. This also allows to use all topology query interfaces
      unconditionally. It does not affect the actual APIC code because either
      the local APIC address was not registered or no local APIC could be
      detected.
      
      Fixes: f1f758a8 ("x86/topology: Add a mechanism to track topology via APIC IDs")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
      5e25eb25
    • Thomas Gleixner's avatar
      x86/topology: Don't evaluate logical IDs during early boot · 7af541ce
      Thomas Gleixner authored
      The local APICs have not yet been enumerated so the logical ID evaluation
      from the topology bitmaps does not work and would return an error code.
      
      Skip the evaluation during the early boot CPUID evaluation and only apply
      it on the final run.
      
      Fixes: 380414be ("x86/cpu/topology: Use topology logical mapping mechanism")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
      7af541ce
    • Thomas Gleixner's avatar
      x86/cpu: Ensure that CPU info updates are propagated on UP · c90399fb
      Thomas Gleixner authored
      The boot sequence evaluates CPUID information twice:
      
        1) During early boot
      
        2) When finalizing the early setup right before
           mitigations are selected and alternatives are patched.
      
      In both cases the evaluation is stored in boot_cpu_data, but on UP the
      copying of boot_cpu_data to the per CPU info of the boot CPU happens
      between #1 and #2. So any update which happens in #2 is never propagated to
      the per CPU info instance.
      
      Consolidate the whole logic and copy boot_cpu_data right before applying
      alternatives as that's the point where boot_cpu_data is in it's final
      state and not supposed to change anymore.
      
      This also removes the voodoo mb() from smp_prepare_cpus_common() which
      had absolutely no purpose.
      
      Fixes: 71eb4893 ("x86/percpu: Cure per CPU madness on UP")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
      c90399fb